Methods and apparatus to deploy workload domains in virtual server racks

ABSTRACT

Methods and apparatus to deploy workload domains in virtual server racks are disclosed. An example apparatus includes a policy manager to determine an availability option and a performance option of the workload domain based on a domain type and determine capacity options based on a user selection of the availability option and a user selection of the performance option by a first user, a deployment manager to identify first ones of a plurality of computing resources to form a placement solution for the workload domain based on the user selection of the availability and performance options, and based on a user selection of one of the determined capacity options by the first user, the plurality of computing resources stored in a resource database accessible by the first user and a second user, and a resource manager to reserve the first ones of the plurality of computing resources to deploy the workload domain for the first user.

RELATED APPLICATIONS

This patent arises from a continuation of U.S. patent application Ser.No. 17/581,157 (now U.S. patent Ser. No. ______), which was filed onJan. 21, 2022, and is a continuation of U.S. patent application Ser. No.15/280,348, (now U.S. Pat. No. 11,263,006) which was filed on Sep. 29,2016, and claims the benefit of U.S. Provisional Patent Application No.62/259,415, filed Nov. 24, 2015, entitled “METHODS AND APPARATUS TODEPLOY AND MANAGE WORKLOAD DOMAINS IN VIRTUAL SERVER RACKS,” and claimsthe benefit of U.S. Provisional Patent Application No. 62/354,038, filedJun. 23, 2016, entitled “METHODS AND APPARATUS TO DEPLOY AND MANAGEWORKLOAD DOMAINS IN VIRTUAL SERVER RACKS.” U.S. patent application Ser.No. 17/581,157; U.S. patent application Ser. No. 15/280,348; U.S.Provisional Patent Application No. 62/259,415; and U.S. ProvisionalPatent Application No. 62/354,038 are hereby incorporated by referenceherein in their entireties. Priority to U.S. patent application Ser. No.17/581,157; U.S. patent application Ser. No. 15/280,348; U.S.Provisional Patent Application No. 62/259,415; and U.S. ProvisionalPatent Application No. 62/354,038 is hereby claimed.

FIELD OF THE DISCLOSURE

The present disclosure relates generally to cloud computing and, moreparticularly, to methods and apparatus to manage workload domains invirtual server racks.

BACKGROUND

The virtualization of computer systems provides numerous benefits suchas the execution of multiple computer systems on a single hardwarecomputer, the replication of computer systems, the extension of computersystems across multiple hardware computers, etc.“Infrastructure-as-a-Service” (also commonly referred to as “IaaS”)generally describes a suite of technologies provided by a serviceprovider as an integrated solution to allow for elastic creation of avirtualized, networked, and pooled computing platform (sometimesreferred to as a “cloud computing platform”). Enterprises may use IaaSas a business-internal organizational cloud computing platform(sometimes referred to as a “private cloud”) that gives an applicationdeveloper access to infrastructure resources, such as virtualizedservers, storage, and networking resources. By providing ready access tothe hardware resources required to run an application, the cloudcomputing platform enables developers to build, deploy, and manage thelifecycle of a web application (or any other type of networkedapplication) at a greater scale and at a faster pace than ever before.

Cloud computing environments may be composed of many processing units(e.g., servers). The processing units may be installed in standardizedframes, known as racks, which provide efficient use of floor space byallowing the processing units to be stacked vertically. The racks mayadditionally include other components of a cloud computing environmentsuch as storage devices, networking devices (e.g., switches), etc.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts example processes that may be used to deploy virtual rackservers for use in examples disclosed herein to deploy and manageworkload domains in such virtual server racks.

FIG. 2 depicts example physical racks in an example virtual server rackdeployment.

FIG. 3 depicts an example configuration of one of the example physicalracks of FIG. 2 .

FIG. 4 depicts an example architecture to configure and deploy theexample virtual server rack of FIG. 2 .

FIG. 5 depicts the example hardware management system (HMS) of FIGS. 2-4interfacing between the example hardware and an example virtual resourcemanager (VRM) of FIGS. 2 and 4 .

FIG. 6 depicts an example hardware management application programinterface (API) of the HMS of FIGS. 2-5 that is between example hardwareresources and an example physical rack resource manager (PRM).

FIG. 7 depicts the example virtual server rack of FIG. 2 with aggregatecapacity across physical racks.

FIG. 8 depicts example management clusters in corresponding ones of theexample physical racks of FIG. 2 .

FIG. 9 depicts two example workload domains executing on the virtualserver rack of FIGS. 2 and 7 .

FIG. 10 is a block diagram of the example operations and managementcomponent of FIGS. 4, 5, 7, and 9 .

FIGS. 11A and 11B depict flowcharts representative of computer readableinstructions that may be executed to implement the example operationsand management component of FIG. 10 to deploy workload domains.

FIGS. 12A-12D depict flowcharts representative of computer readableinstructions that may be executed to implement the example operationsand management component of FIG. 10 to manage workload domains.

FIG. 13 depicts example availability options for configuring workloaddomains.

FIG. 14 depicts additional example policy settings for configuringworkload domains.

FIG. 15 depicts an example resource selection user interface screen forselecting resources for use based on performance options in a workloaddomain.

FIG. 16 depicts an example performance and availability selection userinterface screen for selecting performance and availability for use in aworkload domain.

FIG. 17 depicts an example network configuration user interface screenfor selecting network configurations for use with a workload domain.

FIG. 18 is a block diagram of an example processing platform capable ofexecuting the example machine-readable instructions of FIGS. 11A and 11Bto deploy workload domains and/or the example machine-readableinstructions of FIGS. 12A-12D to manage workload domains.

DETAILED DESCRIPTION

Cloud computing is based on the deployment of many physical resourcesacross a network, virtualizing the physical resources into virtualresources, and provisioning the virtual resources for use across cloudcomputing services and applications. Example systems for virtualizingcomputer systems are described in U.S. patent application Ser. No.11/903,374, entitled “METHOD AND SYSTEM FOR MANAGING VIRTUAL AND REALMACHINES,” filed Sep. 21, 2007, and granted as U.S. Pat. No. 8,171,485,U.S. Provisional Patent Application No. 60/919,965, entitled “METHOD ANDSYSTEM FOR MANAGING VIRTUAL AND REAL MACHINES,” filed Mar. 26, 2007, andU.S. Provisional Patent Application No. 61/736,422, entitled “METHODSAND APPARATUS FOR VIRTUALIZED COMPUTING,” filed Dec. 12, 2012, all threeof which are hereby incorporated herein by reference in their entirety.

When starting up a cloud computing environment or adding resources to analready established cloud computing environment, data center operatorsstruggle to offer cost-effective services while making resources of theinfrastructure (e.g., storage hardware, computing hardware, andnetworking hardware) work together to achieve pain-freeinstallation/operation and optimizing the resources for improvedperformance. Prior techniques for establishing and maintaining datacenters to provide cloud computing services often require customers tounderstand details and configurations of hardware resources to establishworkload domains in which to execute customer services. In examplesdisclosed herein, workload domains are mapped to a management clusterdeployment (e.g., a vSphere cluster of VMware, Inc.) in a single rackdeployment in a manner that is relatively easier to understand andoperate by users than prior techniques. In this manner, as additionalracks are added to a system, cross-rack clusters become an option. Thisenables creating more complex configurations for workload domains asthere are more options for deployment as well as additional managementcluster capabilities that can be leveraged. Examples disclosed hereinfacilitate making workload domain configuration and management easierthan prior techniques.

A management cluster is a group of physical machines and virtualmachines (VM) that host core cloud infrastructure components necessaryfor managing a software defined data center (SDDC) in a cloud computingenvironment that supports customer services. Cloud computing allowsubiquitous, convenient, on-demand network access to a shared pool ofconfigurable computing resources. A cloud computing customer can requestallocations of such resources to support services required by thosecustomers. For example, when a customer requests to run one or moreservices in the cloud computing environment, one or more workloaddomains may be created based on resources in the shared pool ofconfigurable computing resources. Examples disclosed herein enablecustomers to define different domain types, security, capacity,availability, and performance requirements for establishing workloaddomains in server rack deployments without requiring the users to havein-depth knowledge of server rack hardware and configurations.

As used herein, availability refers to the level of redundancy requiredto provide continuous operation expected for the workload domain. Asused herein, performance refers to the computer processing unit (CPU)operating speeds (e.g., CPU gigahertz (GHz)), memory (e.g., gigabytes(GB) of random access memory (RAM)), mass storage (e.g., GB hard drivedisk (HDD), GB solid state drive (SSD)), and power capabilities of aworkload domain. As used herein, capacity refers to the aggregate numberof resources (e.g., aggregate storage, aggregate CPU, etc.) across allservers associated with a cluster and/or a workload domain. In examplesdisclosed herein, the number of resources (e.g., capacity) for aworkload domain is determined based on the redundancy, the CPU operatingspeed, the memory, the storage, the security, and/or the powerrequirements selected by a user. For example, more resources arerequired for a workload domain as the user-selected requirementsincrease (e.g., higher redundancy, CPU speed, memory, storage, security,and/or power options require more resources than lower redundancy, CPUspeed, memory, storage, security, and/or power options). In someexamples, resources are computing devices with set amounts of storage,memory, CPUs, etc. In some examples, resources are individual devices(e.g., hard drives, processors, memory chips, etc.).

Examples disclosed herein support numerous options and configurationcapabilities for deploying workload domains. For example, numerousoptions for domain type, security, availability, performance, andcapacity are supported for configuring workload domains. In addition,examples disclosed herein are able to support any of a number ofuser-requested capacities for workload domains. That is, examplesdisclosed herein may be implemented to inform a user of user-selectablecapacities that may be used for configuring workload domains inparticular rack deployments. In this manner, users' selections ofcapacities are based on capacities useable for configuring workloaddomains in particular rack deployments. That is, users are betterinformed of capacity capabilities of rack deployments to avoid confusionand incorrect parameters during workload domain configuration andmanagement. Examples disclosed herein also enable deploying workloaddomains using optimal configurations that meet user-requested domaintype, security, capacity, availability, and performance configurations.In addition, examples disclosed herein enable generating expandableworkload domains that do maintain initial user-requested domain type,security, capacity, availability, and performance requirements untilusers request modifications to such initial user-requested capabilities.

FIG. 1 depicts example processes 102 and 104 that may be used to deployvirtual rack servers for use in examples disclosed herein to deploy andmanage workload domains in such virtual server racks. For example, theprocesses 102, 104 of FIG. 1 may be used to prepare example physicalracks 202, 204 of FIG. 2 to deploy example virtual server rack 206 ofFIG. 2 . In the illustrated example, the process 102 is a partnerprocess that is implemented by a system integrator to prepare thephysical racks 202, 204 for distribution to a customer. For example, asystem integrator receives and fulfills customer orders for computinghardware. The system integrator obtains computer hardware and/orsoftware from other suppliers (e.g., hardware supplier(s)), andassembles individual hardware components and/or software into functionalcomputing units to fulfill customer orders. Alternatively, a systemintegrator may design and/or build some or all of the hardwarecomponents and/or software to be used in assembling computing units.According to the illustrated example, the system integrator preparescomputing units for other entities (e.g., businesses and/or persons thatdo not own/employ and are not owned/employed by the system integrator).Alternatively, a system integrator may assemble computing units for useby the same entity as the system integrator (e.g., the system integratormay be a department of a company, wherein the company orders and/orutilizes the assembled computing units). In some examples, a systemintegrator is an entity independent of equipment manufacturers such aswhite-label equipment manufacturers that provide hardware withoutbranding. In other examples, a system integrator is an originalequipment manufacturer (OEM) partner or original device manufacturer(ODM) partner that partners with OEMs or ODMs (e.g., non-white labelequipment manufacturers) that provide brand-labeled hardware. ExampleOEM/ODM hardware includes OEM/ODM Servers such as Hewlett-Packard® (HP)servers and Lenovo® servers, and OEM/ODM Switches such as Aristaswitches, and/or any other OEM/ODM servers, switches, or equipment thatare labeled by the original manufacturers.

The example process 104 is to be performed by a customer to startup thephysical racks 202, 204 (FIG. 2 ) prepared by the system integrator todeploy the virtual server rack 206 (FIG. 2 ) at the customer's site. Asused herein, the term customer refers to any person and/or entity thatreceives and/or operates the computing units supplied by a systemintegrator. For example, the The example process 102 is implemented by asystem integrator to assemble and configure the physical racks 202, 204ordered by a customer. For example, the physical racks 202, 204 are acombination of computing hardware and installed software that may beutilized by a customer to create and/or add to a virtual computingenvironment. For example, the physical racks 202, 204 may includeprocessing units (e.g., multiple blade servers), network switches tointerconnect the processing units and to connect the physical racks 202,204 with other computing units (e.g., other physical racks in a networkenvironment such as a cloud computing environment), and/or data storageunits (e.g., network attached storage, storage area network hardware,etc.). The example physical racks 202, 204 of FIG. 2 are prepared by thesystem integrator in a partially configured state to enable thecomputing devices to be rapidly deployed at a customer location (e.g.,in less than 2 hours). For example, the system integrator may installoperating systems, drivers, operations software, management software,etc. The installed components may be configured with some system details(e.g., system details to facilitate intercommunication between thecomponents of the physical racks 202, 204) and/or may be prepared withsoftware to collect further information from the customer when thevirtual server rack is installed and first powered on by the customer.

Initially in the illustrated example of FIG. 1 , a system integratorpartner selects a qualified hardware/software bill of materials (BoM)(block 108) for use in building the physical racks 202, 204. The systemintegrator partner then assembles the hardware for the physical racks202, 204 (block 110). The system integrator partner uses a virtualimaging appliance (VIA) to image the physical racks 202, 204 (block112).

For example, to facilitate preparation of the physical rack 102 fordistribution to a customer, the example system integrator uses the VIAto prepare and configure the operating systems, system configurations,software, etc. on the physical racks 202, 204 prior to shipping theexample physical racks 202, 205 to the customer. The VIA 112 of theillustrated example is a virtual computing appliance provided to thesystem integrator by an example virtual system solutions provider via anetwork. The VIA is executed by the system integrator in a virtualcomputing environment of the system integrator. For example, the VIA maybe a virtual computing image, a virtual application, a container virtualmachine image, a software application installed in an operating systemof a computing unit of the system integrator, etc. The VIA mayalternatively be provided by any other entity and/or may be a physicalcomputing device, may be multiple physical computing devices, and/or maybe any combination of virtual and physical computing components.

The VIA used in the illustrated example retrieves software images andconfiguration data from the virtual systems solutions provider via thenetwork for installation on the physical racks 202, 204 duringpreparation of the physical racks 202, 204. The VIA used in theillustrated example pushes (e.g., transmits, sends, etc.) the softwareimages and configuration data to the components of the physical racks202, 204. For example, the VIA used in the illustrated example includesmultiple network connections (e.g., virtual network connections,physical network connects, and/or any combination of virtual and networkconnections). For example, the VIA connects to a management interface ofa network switch(es) installed in the physical racks 202, 204, installsnetwork configuration information on the network switch(es), and rebootsthe switch(es) to load the installed configuration to communicativelycouple the VIA with the computing unit(s) communicatively coupled viathe network switch(es). The VIA also connects to a management networkinterface (e.g., an out of band (OOB) interface) of a server(s)installed in the example physical racks 202, 204 to cause an operatingsystem(s) to be installed (e.g., utilizing a preboot executionenvironment (PXE) boot of an operating system installer). The VIA isalso used to install virtual environment management components(described in further detail in conjunction with FIGS. 3-6 and in thefollowing pages) and causes the virtual environment managementcomponents to boot so that they can take over the deployment of theexample server racks 202, 204.

A virtual system solutions provider that provides the VIA to the systemintegrator partner is a business, such as VMware, Inc., that distributes(e.g., sells) the VIA. The virtual system solutions provider alsoprovides a repository of images and/or other types of software (e.g.,virtual machine images, drivers, operating systems, etc.) that may beretrieved by the VIA and installed on the physical racks 202, 204. Thevirtual system solutions provider may alternatively be implemented bymultiple entities (e.g., from a manufacturer(s) of the software) and/orany other type of entity. Additional details of example VIAs aredisclosed in U.S. patent application Ser. No. 14/752,699, filed on Jun.26, 2015, and titled “Methods and Apparatus for Rack Deployments forVirtual Computing Environments,” which is hereby incorporated byreference herein in its entirety.

After imaging the physical racks 202, 204 at block 112, the systemintegrator ships and/or otherwise delivers the physical racks 202, 204to the customer (block 114). Thus, the physical racks 202, 204 have beenpre-configured to allow the customer to power on the example physicalracks 202, 204 and quickly prepare the physical racks 202, 204 forinstallation in a new and/or existing computing system (e.g., a cloudcomputing system).

Turning now to the example process 104, the physical racks 202, 204initially arrive at the customer site from the system integrator and thecustomer connects the physical racks 202, 204 to a network and powersthe physical racks 202, 204 (block 116). For example, upon initiallypowering on the example physical racks 202, 204, the components of theexample physical racks 202, 204 are already configured to communicatewith each other and execute operating systems and software, which allowsthe example physical racks 202, 204 to provide an interface (e.g., awebpage interface) that, when accessed by the customer or an installer,gathers additional information for completing the configuration of thephysical racks 202, 204. For example, the interface may gather and/orconfigure user credentials, network information, information aboutnetworked components (e.g., an address for a storage device such as astorage area network (SAN), an address for a management system (e.g., aVMware vCenter server(s)), etc.). The gathered information can beutilized by the components of the example physical racks 202, 204 tosetup the physical racks 202, 204 as part of a new computing clusterand/or add the example physical racks 202, 204 to an existing computingcluster (e.g., a cloud computing system). For example, the customer mayspecify different domain types, security, capacity, availability, andperformance requirements for establishing workload domains in thevirtual server rack 206 (FIG. 2 ) without requiring the customer to havein-depth knowledge of the hardware and configurations of the physicalracks 202, 204.

After the customer powers on the physical racks 202, 204 at block 116,hardware management systems (HMSs) 208, 214 (FIG. 2 ) of the physicalracks 202, 204 auto discover hardware resources in the physical racks202, 204, boot hosts and switches in the physical racks 202, 204,install stacks in the physical racks 202, 204, and make the physicalracks 202, 204 inventory ready (block 118). For example, the physicalracks 202, 204 are inventory ready for virtual rack managers (VRMs) 225,227 of FIG. 2 to collect and manage hardware resource inventories of thephysical racks 202, 204. The HMSs 208, 214 are described below inconnection with FIGS. 2-6 . Additional details of the HMSs 208, 214 arealso disclosed in U.S. patent application Ser. No. 14/788,004, filed onJun. 30, 2015, and titled “Methods and Apparatus to Configure HardwareManagement Systems for use in Virtual Server Rack Deployments forVirtual Computing Environments,” which is hereby incorporated byreference herein in its entirety.

The VRMs 225, 227 (e.g., an EVO manager) are initialized and allocateresources, starts a cloud infrastructure service (e.g., a VMware vCenterserver), and creates management clusters (block 120). The VRMs 225, 227are described below in connection with FIGS. 2-6 . Additional details ofthe VRMs 225, 227 are also disclosed in U.S. patent application Ser. No.14/796,803, filed on Jul. 10, 2015, and titled “Methods and Apparatus toConfigure Virtual Resource Managers for use in Virtual Server RackDeployments for Virtual Computing Environments,” which is herebyincorporated by reference herein in its entirety.

A software defined data center (SDDC) is then ready to run in thevirtual server rack 206 on the physical racks 202, 204 (block 122).

FIG. 2 depicts the example physical racks 202, 204 in an exampledeployment of the virtual server rack 206. In the illustrated example,the first physical rack 202 has an example top-of-rack (ToR) switch A210, an example ToR switch B 212, an example management switch 207, andan example server host node(0) 209. In the illustrated example, themanagement switch 207 and the server host node(0) 209 run a hardwaremanagement system (HMS) 208 for the first physical rack 202. The secondphysical rack 204 of the illustrated example is also provided with anexample ToR switch A 216, an example ToR switch B 218, an examplemanagement switch 213, and an example server host node(0) 211. In theillustrated example, the management switch 213 and the server host node(0) 211 run an HMS 214 for the second physical rack 204.

In the illustrated example, the management switches 207, 213 of thecorresponding physical racks 202, 204 run corresponding out-of-band(OOB) agents (e.g., an example OOB agent 612 described below inconnection with FIG. 6 ) and OOB plugins (e.g., an example OOB plugin621 described below in connection with FIG. 6 ) of the correspondingHMSs 208, 214. Also in the illustrated example, the server host nodes(0)209, 211 of the corresponding physical racks 202, 204 run correspondingIB agents (e.g., an example IB agent 613 described below in connectionwith FIG. 6 ), IB plugins (e.g., an example IB plugin 623 describedbelow in connection with FIG. 6 ), HMS service APIs (e.g., an examplegeneric HMS service API 610 described below in connection with FIG. 6 ),and aggregators (e.g., an example HMS aggregator 611 described below inconnection with FIG. 6 ).

In the illustrated example, the HMS 208, 214 connects to servermanagement ports of the server host node(0) 209, 211 (e.g., using abaseboard management controller (BMC)), connects to ToR switchmanagement ports (e.g., using 1 Gbps links) of the ToR switches 210,212, 216, 218, and also connects to spine switch management ports of oneor more spine switches 222. These example connections form anon-routable private Internet protocol (IP) management network for OOBmanagement. The HMS 208, 214 of the illustrated example uses this OOBmanagement interface to the server management ports of the server hostnode(0) 209, 211 for server hardware management. In addition, the HMS208, 214 of the illustrated example uses this OOB management interfaceto the ToR switch management ports of the ToR switches 210, 212, 216,218 and to the spine switch management ports of the one or more spineswitches 222 for switch management. In examples disclosed herein, theToR switches 210, 212, 216, 218 connect to server network interface card(NIC) ports (e.g., using 10 Gbps links) of server hosts in the physicalracks 202, 204 for downlink communications and to the spine switch(es)(e.g., using 40 Gbps links) for uplink communications. In theillustrated example, the management switch 207, 213 is also connected tothe ToR switches 210, 212, 216, 218 (e.g., using a 10 Gbps link) forinternal communications between the management switch 207, 213 and theToR switches 210, 212, 216, 218. Also in the illustrated example, theHMS 208, 214 is provided with IB connectivity to individual server nodes(e.g., server nodes in example physical hardware resources 224, 226) ofthe physical rack 202, 204. In the illustrated example, the IBconnection interfaces to physical hardware resources 224, 226 via anoperating system running on the server nodes using an OS-specific APIsuch as vSphere API, command line interface (CLI), and/or interfacessuch as Common Information Model from Distributed Management Task Force(DMTF).

The HMSs 208, 214 of the corresponding physical racks 202, 204 interfacewith virtual rack managers (VRMs) 225, 227 of the corresponding physicalracks 202, 204 to instantiate and manage the virtual server rack 206using physical hardware resources 224, 226 (e.g., processors, networkinterface cards, servers, switches, storage devices, peripherals, powersupplies, etc.) of the physical racks 202, 204. In the illustratedexample, the VRM 225 of the first physical rack 202 runs on a cluster ofthree server host nodes of the first physical rack 202, one of which isthe server host node(0) 209. As used herein, the term “host” refers to afunctionally indivisible unit of the physical hardware resources 224,226, such as a physical server that is configured or allocated, as awhole, to a virtual rack and/or workload; powered on or off in itsentirety; or may otherwise be considered a complete functional unit.Also in the illustrated example, the VRM 227 of the second physical rack204 runs on a cluster of three server host nodes of the second physicalrack 204, one of which is the server host node(0) 211. In theillustrated example, the VRMs 225, 227 of the corresponding physicalracks 202, 204 communicate with each other through one or more spineswitches 222. Also in the illustrated example, communications betweenphysical hardware resources 224, 226 of the physical racks 202, 204 areexchanged between the ToR switches 210, 212, 216, 218 of the physicalracks 202, 204 through the one or more spine switches 222. In theillustrated example, each of the ToR switches 210, 212, 216, 218 isconnected to each of two spine switches 222. In other examples, fewer ormore spine switches may be used. For example, additional spine switchesmay be added when physical racks are added to the virtual server rack206.

The VRM 225 runs on a cluster of three server host nodes of the firstphysical rack 202 using a high availability (HA) mode configuration. Inaddition, the VRM 227 runs on a cluster of three server host nodes ofthe second physical rack 204 using the HA mode configuration. Using theHA mode in this manner, enables fault tolerant operation of the VRM 225,227 in the event that one of the three server host nodes in the clusterfor the VRM 225, 227 fails. In some examples, a minimum of three hostsor fault domains (FD) are used for failure-to-tolerance (FTT), FTT=1. Insome examples, a minimum of five hosts or FDs are used for FTT=2. Uponfailure of a server host node executing the VRM 225, 227, the VRM 225,227 can be restarted to execute on another one of the hosts in thecluster. Therefore, the VRM 225, 227 continues to be available even inthe event of a failure of one of the server host nodes in the cluster.

In examples disclosed herein, a command line interface (CLI) and APIsare used to manage the ToR switches 210, 212, 216, 218. For example, theHMS 208, 214 uses CLI/APIs to populate switch objects corresponding tothe ToR switches 210, 212, 216, 218. On HMS bootup, the HMS 208, 214populates initial switch objects with statically available information.In addition, the HMS 208, 214 uses a periodic polling mechanism as partof an HMS switch management application thread to collect statisticaland health data from the TOR switches 210, 212, 216, 218 (e.g., Linkstates, Packet Stats, Availability, etc.). There is also a configurationbuffer as part of the switch object which stores the configurationinformation to be applied on the switch.

FIG. 3 depicts an example configuration of one of the example physicalracks 202, 204 of FIG. 2 . In the illustrated example of FIG. 3 , theHMS 208, 214 is in communication with a physical hardware resource 224,226 through a management network interface card (NIC) 302. The exampleHMS 208, 214 is also shown in communication with the example ToRswitches 210, 216, 212, 218. The example ToR switches 210, 216, 212, 218are in communication with a distributed switch 306 through multipleuplink ports 308, 310 of the distributed switch 306. In the illustratedexample, the uplink ports 308, 310 are implemented using separatenetwork interface cards (NICs).

In the illustrated example, the distributed switch 306 runs numerousvirtual adapters known as virtual machine kernels (VMKs) including anexample VMK0 management kernel 314, an example VMK1 vMotion kernel 316,an example VMK2 vSAN kernel 318, and an example VMK3 VXLAN 320. The VMK0management kernel 314 virtual adapter is software executed by thedistributed switch 306 to manage use of ones of or portions of thephysical hardware resources 224, 226 allocated for use by thedistributed switch 306. In examples disclosed herein, the VRM1 225 ofFIG. 2 uses the VMK0 management kernel 314 to communicate with the VRM2227 through the spine switches 222 of FIG. 2 . The VMK1 vMotion 316virtual adapter is software executed by the distributed switch 306 tofacilitate live migration of virtual machines between physical hardwareresources 224, 226 with substantially little or no downtime to providecontinuous service availability from the virtual machines beingmigrated. The VMK2 vSAN 318 virtual adapter is software executed by thedistributed switch 306 to aggregate locally attached data storage disksin a virtual cluster to create a storage solution that can beprovisioned from the distributed switch 306 during virtual machineprovisioning operations. The example VMK3 VXLAN 320 is virtual adaptersoftware executed by the distributed switch to establish and/or supportone or more virtual networks provisioned in the distributed switch 306.In the illustrated example, the VMK3 VXLAN 320 is in communication withan example network virtualization manager 304. The networkvirtualization manager 304 of the illustrated example managesvirtualized network resources such as physical hardware switches toprovide software-based virtual networks. The example networkvirtualization manager 304 may be implemented using, for example, theVMware NSX® network virtualization manager 416 of FIG. 4 . In theillustrated example of FIG. 3 , the distributed switch 306 is showninterfacing with one or more of the physical hardware resources 224, 226through multiple NICs 322, 324. In this manner, the VM kernels 314, 316,318, 320 can instantiate virtual resources based on one or more, orportions of, the physical hardware resources 224, 226.

The HMS 208, 214 of the illustrated examples of FIGS. 2 and 3 , is astateless software agent responsible for managing individual hardwareelements in a physical rack 202, 204. Examples of hardware elements thatthe HMS 208, 214 manages are servers and network switches in thephysical rack 202, 204. In the illustrated example, the HMS 208, 214 isimplemented using Java on Linux so that an OOB portion (e.g., the OOBagent 612 of FIG. 6 ) of the HMS 208, 214 run as a Java application on awhite box management switch (e.g., the management switch 207, 213) inthe physical rack 202, 204. However, any other programming language andany other operating system may be used to implement the HMS 208, 214.The physical hardware resources 224, 226 that the HMS 208, 214 managesinclude white label equipment such as white label servers, white labelnetwork switches, white label external storage arrays, and white labeldisaggregated rack architecture systems (e.g., Intel's Rack ScaleArchitecture (RSA)). White label equipment is computing equipment thatis unbranded and sold by manufacturers to system integrators thatinstall customized software, and possibly other hardware, on the whitelabel equipment to build computing/network systems that meetspecifications of end users or customers. The white labeling, orunbranding by original manufacturers, of such equipment enablesthird-party system integrators to market their end-user integratedsystems using the third-party system integrators' branding. In someexamples, the HMS 208, 214 may also be used to manage non-white labelequipment such as original equipment manufacturer (OEM) equipment. SuchOEM equipment includes OEM Servers such as Hewlett-Packard® (HP) serversand Lenovo® servers, and OEM Switches such as Arista switches, and/orany other OEM server, switches, or equipment.

FIG. 4 depicts an example architecture 400 in which an example virtualimaging appliance 422 (e.g., the example VIA described in connectionwith FIG. 1 ) is utilized to configure and deploy the virtual serverrack 206 (e.g., one or more of the example physical racks 202, 204 ofFIG. 2 ).

The example architecture 400 of FIG. 4 includes a hardware layer 402, avirtualization layer 404, and an operations and management component406. In the illustrated example, the hardware layer 402, thevirtualization layer 404, and the operations and management component406 are part of the example virtual server rack 206 of FIG. 2 . Thevirtual server rack 206 of the illustrated example is based on thephysical racks 202, 204 of FIG. 2 . Alternatively, either one of thephysical racks 202, 204 may be operated in a stand-alone manner toinstantiate and run the virtual server rack 206. The example virtualserver rack 206 is configured to configure the physical hardwareresources 224, 226, to virtualize the physical hardware resources 224,226 into virtual resources, to provision virtual resources for use inproviding cloud-based services, and to maintain the physical hardwareresources 224, 226 and the virtual resources. The example architecture400 includes a virtual imaging appliance (VIA) 422 that communicateswith the hardware layer 402 to store operating system (OS) and softwareimages in memory of the hardware layer 402 for use in initializingphysical resources needed to configure the virtual server rack 206. Inthe illustrated example, the VIA 422 retrieves the OS and softwareimages from a virtual system solutions provider image repository 424 viaan example network 426 (e.g., the Internet). For example, the VIA 422may be the VIA provided to a system integrator as described inconnection with FIG. 1 by a virtual system solutions provider toconfigure new physical racks (e.g., the physical racks 202, 204 of FIGS.2 and 3 ) for use as virtual server racks (e.g., the virtual server rack206). That is, whenever the system integrator wishes to configure newhardware (e.g., a new physical rack) for use as a virtual server rack,the system integrator connects the VIA 422 to the new hardware, and theVIA 422 communicates with the virtual system provider image repository424 to retrieve OS and/or software images needed to configure the newhardware for use as a virtual server rack. In the illustrated example,the OS and/or software images located in the virtual system providerimage repository 424 are configured to provide the system integratorwith flexibility in selecting to obtain hardware from any of a number ofhardware manufacturers. As such, end users can source hardware frommultiple hardware manufacturers without needing to develop customsoftware solutions for each hardware manufacturer. Further details ofthe example VIA 422 are disclosed in U.S. patent application Ser. No.14/752,699, filed on Jun. 26, 2015, and titled “Methods and Apparatusfor Rack Deployments for Virtual Computing Environments,” which ishereby incorporated herein by reference in its entirety.

The example hardware layer 402 of FIG. 4 includes the HMS 208, 214 ofFIGS. 2 and 3 that interfaces with the physical hardware resources 224,226 (e.g., processors, network interface cards, servers, switches,storage devices, peripherals, power supplies, etc.). The HMS 208, 214 isconfigured to manage individual hardware nodes such as different ones ofthe physical hardware resources 224, 226. For example, managing of thehardware nodes involves discovering nodes, bootstrapping nodes,resetting nodes, processing hardware events (e.g., alarms, sensor datathreshold triggers) and state changes, exposing hardware events andstate changes to other resources and a stack of the virtual server rack206 in a hardware-independent manner. The HMS 208, 214 also supportsrack-level boot-up sequencing of the physical hardware resources 224,226 and provides services such as secure resets, remote resets, and/orhard resets of the physical hardware resources 224, 226.

The HMS 208, 214 of the illustrated example is part of a dedicatedmanagement infrastructure in a corresponding physical rack 202, 204including the dual-redundant management switches 207, 213 and dedicatedmanagement ports attached to the server host nodes(0) 209, 211 and theToR switches 210, 212, 216, 218 (FIGS. 2 and 3 ). In the illustratedexample, one instance of the HMS 208, 214 runs per physical rack 202,204. For example, the HMS 208, 214 may run on the management switch 207,213 and the server host node(0) 209, 211 installed in the examplephysical racks 202, 204 of FIG. 2 . In the illustrated example of FIG. 2both of the HMSs 208, 214 are provided in corresponding managementswitches 207, 213 and the corresponding server host nodes(0) 209, 211 asa redundancy feature in which one of the HMSs 208, 214 is a primary HMS,while the other one of the HMSs 208, 214 is a secondary HMS. In thismanner, one of the HMSs 208, 214 may take over as a primary HMS in theevent of a failure of a hardware management switch 207, 213 and/or afailure of the server host nodes(0) 209, 211 on which the other HMS 208,214 executes. In some examples, to achieve seamless failover, twoinstances of an HMS 208, 214 run in a single physical rack 202, 204. Insuch examples, the physical rack 202, 204 is provided with twomanagement switches, and each of the two management switches runs aseparate instance of the HMS 208, 214. In such examples, the physicalrack 202 of FIG. 2 runs two instances of the HMS 208 on two separatephysical hardware management switches and two separate server hostnodes(0), and the physical rack 204 of FIG. 2 runs two instances of theHMS 214 on two separate physical hardware management switches and twoseparate server host nodes(0). In this manner, for example, one of theinstances of the HMS 208 on the physical rack 202 serves as the primaryHMS 208 and the other instance of the HMS 208 serves as the secondaryHMS 208. The two instances of the HMS 208 on two separate managementswitches and two separate server host nodes(0) in the physical rack 202(or the two instances of the HMS 214 on two separate management switchesand two separate server host nodes(0) in the physical rack 204) areconnected over a point-to-point, dedicated Ethernet link which carriesheartbeats and memory state synchronization between the primary andsecondary HMS instances.

There are numerous categories of failures that the HMS 208, 214 canencounter. Some example failure categories are shown below in Table 1.

TABLE 1 HMS Failure Categories Failure Type Examples ImpactRemediation 1. HMS Agent Unable to allocate Short term Restart fromSoftware new resources loss of HMS Monitor Failures Memory corruptionfunction Software Crash [Minutes] CPU hogging Memory leaks 2. HMS AgentUnable to start Longer term Maintenance Unrecoverable demon loss of HMSmode thin HMS Software Unable to resolve function Agent till FailureFailure Type 1 [Hours] issue resolved Consistent software crash 3.Management Processes Failures Short to Long Process restart SwitchKernel Failures Term for user Operating Unable to boot Loss of Mgmtprocesses. System switch OS Switch and Reboots for SoftwareONIE/bootloader HMS function Kernel failures Failures issues Manualintervention for failed boots 4. Management Link down on Portions ofReset Links from Switch management ports rack PRM Hardware to Serverunavailable Notify VRM for Failures Link Down on VRM-HMS manualmanagement ports communication intervention to TOR nodes loss Link downfrom VRM Host to HMS on Mgmt Switch Critical Hardware alarms 5.Management Management Long term loss Manual Switch Un- switch fails toboot of HMS/Mgmt intervention or Recoverable Erratic Resets of Switchstandby switch Hardware hardware while Failure running

In the illustrated example of FIG. 4 , the hardware layer 402 includesan example HMS monitor 428 to monitor the operational status and healthof the HMS 208, 214. The example HMS monitor 428 is an external entityoutside of the context of the HMS 208, 214 that detects and remediatesfailures in the HMS 208, 214. That is, the HMS monitor 428 is a processthat runs outside the HMS daemon to monitor the daemon. For example, theHMS monitor 428 can run alongside the HMS 208, 214 in the samemanagement switch 207, 213 as the HMS 208, 214. The example HMS monitor428 is configured to monitor for Type 1 failures of Table 1 above andrestart the HMS daemon when required to remediate such failures. Theexample HMS monitor 428 is also configured to invoke a HMS maintenancemode daemon to monitor for Type 2 failures of Table 1 above. In examplesdisclosed herein, an HMS maintenance mode daemon is a minimal HMS agentthat functions as a basic backup of the HMS 208, 214 until the Type 2failure of the HMS 208, 214 is resolved.

The example virtualization layer 404 includes the virtual rack manager(VRM) 225, 227. The example VRM 225, 227 communicates with the HMS 208,214 to manage the physical hardware resources 224, 226. The example VRM225, 227 creates the example virtual server rack 206 out of underlyingphysical hardware resources 224, 226 that may span one or more physicalracks (or smaller units such as a hyper-appliance or half rack) andhandles physical management of those resources. The example VRM 225, 227uses the virtual server rack 206 as a basis of aggregation to create andprovide operational views, handle fault domains, and scale toaccommodate workload profiles. The example VRM 225, 227 keeps track ofavailable capacity in the virtual server rack 206, maintains a view of alogical pool of virtual resources throughout the SDDC life-cycle, andtranslates logical resource provisioning to allocation of physicalhardware resources 224, 226. The example VRM 225, 227 interfaces withcomponents of the virtual system solutions provider described inconnection with FIG. 1 such as an example VMware vSphere® virtualizationinfrastructure components suite 408, an example VMware vCenter® virtualinfrastructure server 410, an example ESXi™ hypervisor component 412, anexample VMware NSX® network virtualization platform 414 (e.g., a networkvirtualization component or a network virtualizer), an example VMwareNSX® network virtualization manager 416, and an example VMware vSAN™network data storage virtualization component 418 (e.g., a network datastorage virtualizer). In the illustrated example, the VRM 225, 227communicates with these components to manage and present the logicalview of underlying resources such as hosts and clusters. The example VRM225, 227 also uses the logical view for orchestration and provisioningof workloads. Additional details of the VRM 225, 227 are disclosed belowin connection with FIG. 5 .

The VMware vSphere® virtualization infrastructure components suite 408of the illustrated example is a collection of components to setup andmanage a virtual infrastructure of servers, networks, and otherresources. Example components of the VMware vSphere® virtualizationinfrastructure components suite 408 include the example VMware vCenter®virtual infrastructure server 410 and the example ESXi™ hypervisorcomponent 412.

The example VMware vCenter® virtual infrastructure server 410 providescentralized management of a virtualization infrastructure (e.g., aVMware vSphere® virtualization infrastructure). For example, the VMwarevCenter® virtual infrastructure server 410 provides centralizedmanagement of virtualized hosts and virtual machines from a singleconsole to provide IT administrators with access to inspect and manageconfigurations of components of the virtual infrastructure.

The example ESXi™ hypervisor component 412 is a hypervisor that isinstalled and runs on servers (e.g., the example physical servers 616 ofFIG. 6 ) in the example physical resources 224, 226 to enable theservers to be partitioned into multiple logical servers to createvirtual machines.

The example VMware NSX® network virtualization platform 414 (e.g., anetwork virtualization component or a network virtualizer) virtualizesnetwork resources such as physical hardware switches (e.g., the physicalswitches 618 of FIG. 6 ) to provide software-based virtual networks. Theexample VMware NSX® network virtualization platform 414 enables treatingphysical network resources (e.g., switches) as a pool of transportcapacity. In some examples, the VMware NSX® network virtualizationplatform 414 also provides network and security services to virtualmachines with a policy driven approach.

The example VMware NSX® network virtualization manager 416 managesvirtualized network resources such as physical hardware switches (e.g.,the physical switches 618 of FIG. 6 ) to provide software-based virtualnetworks. In the illustrated example, the VMware NSX® networkvirtualization manager 416 is a centralized management component of theVMware NSX® network virtualization platform 414 and runs as a virtualappliance on an ESXi host (e.g., one of the physical servers 616 of FIG.6 running an ESXi™ hypervisor 412). In the illustrated example, a VMwareNSX® network virtualization manager 416 manages a single vCenter serverenvironment implemented using the VMware vCenter® virtual infrastructureserver 410. In the illustrated example, the VMware NSX® networkvirtualization manager 416 is in communication with the VMware vCenter®virtual infrastructure server 410, the ESXi™ hypervisor component 412,and the VMware NSX® network virtualization platform 414.

The example VMware vSAN™ network data storage virtualization component418 is software-defined storage for use in connection with virtualizedenvironments implemented using the VMware vSphere® virtualizationinfrastructure components suite 408. The example VMware vSAN™ networkdata storage virtualization component clusters server-attached hard diskdrives (HDDs) and solid state drives (SSDs) to create a shared datastorefor use as virtual storage resources in virtual environments.

Although the example VMware vSphere® virtualization infrastructurecomponents suite 408, the example VMware vCenter® virtual infrastructureserver 410, the example ESXi™ hypervisor component 412, the exampleVMware NSX® network virtualization platform 414, the example VMware NSX®network virtualization manager 416, and the example VMware vSAN™ networkdata storage virtualization component 418 are shown in the illustratedexample as implemented using products developed and sold by VMware,Inc., some or all of such components may alternatively be supplied bycomponents with the same or similar features developed and sold by othervirtualization component developers.

The virtualization layer 404 of the illustrated example, and itsassociated components are configured to run virtual machines. However,in other examples, the virtualization layer 404 may additionally oralternatively be configured to run containers. A virtual machine is adata computer node that operates with its own guest operating system ona host using resources of the host virtualized by virtualizationsoftware. A container is a data computer node that runs on top of a hostoperating system without the need for a hypervisor or separate operatingsystem.

The virtual server rack 206 of the illustrated example enablesabstracting the physical hardware resources 224, 226. In some examples,the virtual server rack 206 includes a set of physical units (e.g., oneor more racks) with each unit including hardware 224, 226 such as servernodes (e.g., compute+storage+network links), network switches, and,optionally, separate storage units. From a user perspective, the examplevirtual server rack 206 is an aggregated pool of logic resources exposedas one or more vCenter ESXi™ clusters along with a logical storage pooland network connectivity. In examples disclosed herein, a cluster is aserver group in a virtual environment. For example, a vCenter ESXi™cluster is a group of physical servers (e.g., example physical servers616 of FIG. 6 ) in the physical hardware resources 224, 226 that runESXi™ hypervisors (developed and sold by VMware, Inc.) to virtualizeprocessor, memory, storage, and networking resources into logicalresources to run multiple virtual machines that run operating systemsand applications as if those operating systems and applications wererunning on physical hardware without an intermediate virtualizationlayer.

In the illustrated example, the example OAM component 406 is anextension of a VMware vCloud® Automation Center (VCAC) that relies onthe VCAC functionality and also leverages utilities such as vRealize,Log Insight™, and Hyperic® to deliver a single point of SDDC operationsand management. The example OAM component 406 is configured to providedifferent services such as heat-map service, capacity planner service,maintenance planner service, events and operational view service, andvirtual rack application workloads manager service.

In the illustrated example, a heat map service of the OAM component 406exposes component health for hardware mapped to virtualization andapplication layers (e.g., to indicate good, warning, and criticalstatuses). The example heat map service also weighs real-time sensordata against offered service level agreements (SLAs) and may triggersome logical operations to make adjustments to ensure continued SLA.

In the illustrated example, the capacity planner service of the OAMcomponent 406 checks against available resources and looks for potentialbottlenecks before deployment of an application workload. Examplecapacity planner service also integrates additional rack units in thecollection/stack when capacity is expanded.

In the illustrated example, the maintenance planner service of the OAMcomponent 406 dynamically triggers a set of logical operations torelocate virtual machines (VMs) before starting maintenance on ahardware component to increase the likelihood of substantially little orno downtime. The example maintenance planner service of the OAMcomponent 406 creates a snapshot of the existing state before startingmaintenance on an application. The example maintenance planner serviceof the OAM component 406 automates software upgrade/maintenance bycreating a clone of the machines and proceeds to upgrade software onclones, pause running machines, and attaching clones to a network. Theexample maintenance planner service of the OAM component 406 alsoperforms rollbacks if upgrades are not successful.

In the illustrated example, an events and operational views service ofthe OAM component 406 provides a single dashboard for logs by feeding toLog Insight. The example events and operational views service of the OAMcomponent 406 also correlates events from the heat map service againstlogs (e.g., a server starts to overheat, connections start to drop, lotsof HTTP/503 from App servers). The example events and operational viewsservice of the OAM component 406 also creates a business operations view(e.g., a top down view from Application Workloads=>Logical ResourceView=>Physical Resource View). The example events and operational viewsservice of the OAM component 406 also provides a logical operations view(e.g., a bottom up view from Physical resource view=>vCenter ESXiCluster View=>VM's view).

In the illustrated example, the virtual rack application workloadsmanager service of the OAM component 406 uses vCAC and vCAC enterpriseservices to deploy applications to vSphere hosts. The example virtualrack application workloads manager service of the OAM component 406 usesdata from the heat map service, the capacity planner service, themaintenance planner service, and the events and operational viewsservice to build intelligence to pick the best mix of applications on ahost (e.g., not put all high CPU intensive apps on one host). Theexample virtual rack application workloads manager service of the OAMcomponent 406 optimizes applications and virtual storage area network(vSAN) arrays to have high data resiliency and best possible performanceat same time.

FIG. 5 depicts another view of the example architecture 400 of FIG. 4showing the example HMS 208, 214 of FIGS. 2-4 interfacing between theexample physical hardware resources 224, 226 of FIGS. 2-4 and theexample VRM 225, 227 of the example architecture 400 of FIG. 4 . In theillustrated example, the VRM 225, 227 includes numerous applicationprogram interfaces (APIs) 502, 504, 506, 508 to interface with othercomponents of the architecture 400. The APIs 502, 504, 506, 508 of theillustrated example include routines, protocols, function calls, andother components defined for use by external programs, routines, orcomponents to communicate with the VRM 225, 227. Such communications mayinclude sending information to the VRM 225, 227, requesting informationfrom the VRM 225, 227, requesting the VRM 225, 227 to performoperations, configuring the VRM 225, 227, etc. For example, an HMS APIinterface 502 of the VRM 225, 227 is to facilitate communicationsbetween the HMS 208, 214 and the VRM 225, 227, another API interface 506of the VRM 225, 227 is to facilitate communications between theoperations and management component 406 and the VRM 225, 227, andanother API interface 508 of the VRM 225, 227 is to facilitatecommunications between the VRM 225, 227 and the network virtualizationmanager 304 and a vCenter server 510. Another API interface 504 of theVRM 225, 227 may be used to facilitate communications between the VRM225, 227 and user interfaces for use by administrators to manage the VRM225, 227.

The example VRM 225, 227 communicates with the HMS 208, 214 via the HMSAPI interface 502 to manage the physical hardware resources 224, 226.For example, the VRM 225, 227 obtains and maintains inventory of thephysical hardware resources 224, 226 through communications with the HMS208, 214. The example VRM 225, 227 also uses the HMS 208, 214 todiscover new hardware (e.g., the physical hardware resources 224, 226)and adds newly discovered hardware to inventory. The example VRM 225,227 is also configured to manage the physical hardware resources 224,226 within the virtual server rack 206 by using the per-rack HMS 208,214. The example VRM 225, 227 maintains the notion of fault domains anduses those domains in its mapping of logical resources (e.g., virtualresources) to the physical hardware resources 224, 226. In response tonotification of hardware events from the HMS 208, 214, the example VRM225, 227 handles addition/removal of physical hardware resources 224,226 (e.g., servers or switches at a physical rack level), addition ofnew rack units, maintenance, and hard shutdowns/resets. The example VRM225, 227 also translates physical sensor data and alarms to logicalevents.

In the illustrated example of FIG. 5 , a software stack of the VRM 225,227 includes an example workflow services engine 514, an exampleresource aggregation and correlations engine 516, an example physicalresource manager (PRM) 518, an example logical resource manager (LRM)520, an example broadcasting and election manager 522, an examplesecurity manager 524, an example asset inventory and license manager526, an example logical object generation engine 528, an example eventprocess manager 530, an example VRM directory 532, example extensibilitytools 534, an example configuration component service 536, an exampleVRM configuration component 538, and an example configuration userinterface (UI) 540. The example VRM 225, 227 also includes an exampleVRM data store 542. The example workflow services engine 514 is providedto manage the workflows of services provisioned to be performed byresources of the virtual server rack 206. The example resourceaggregation and correlations engine 516 is provided to aggregate logicaland physical resources and to coordinate operations between the logicaland physical resources for allocating to services to be performed by thevirtual server rack 206. The example PRM 518 is provided to provision,maintain, allocate, and manage the physical hardware resources 224, 226for use by the virtual server rack 206 for provisioning and allocatinglogical resources. The example LRM 520 is provided to provision,maintain, allocate, and manage logical resources.

The example broadcasting and election manager 522 is provided tobroadcast or advertise capabilities of the virtual server rack 206. Forexample, services seeking resources of virtual server racks may obtaincapabilities (e.g., logical resources) that are available from thevirtual server rack 206 by receiving broadcasts or advertisements ofsuch capabilities from the broadcasting and election manager 522. Thebroadcasting and election manager 522 is also configured to identifyresources of the virtual server rack 206 that have been requested forallocation. The example security manager 524 is provided to implementsecurity processes to protect from misuse of resources of the virtualserver rack 206 and/or to protect from unauthorized accesses to thevirtual server rack 206.

In the illustrated example, the broadcasting and election manager 522 isalso provided to manage an example primary VRM selection process. Inexamples disclosed herein, a primary VRM selection process is performedby the VRM 225, 227 to determine a VRM that is to operate as the primaryVRM for a virtual server rack. For example, as shown in FIG. 2 , theexample virtual server rack 206 includes the first VRM 225 that runs inthe first physical rack 202, and the second VRM 227 that runs in thesecond physical rack 204. In the illustrated example of FIG. 2 , thefirst VRM 225 and the second VRM 227 communicate with each other toperform the primary VRM selection process. For example, the VRM225 mayperform a process to obtain information from the second VRM 227 andexecute an algorithm to decide whether it (the first VRM 225) or thesecond VRM 227 are to be the primary VRM to manage virtual resources ofall the physical racks 202, 204 of the virtual server rack 206. In someexamples, the broadcasting and election manager 522 instantiates azookeeper of the corresponding VRM 225, 227. In some examples, thebroadcasting and election manager 522 performs the primary VRM selectionprocess as part of the zookeeper.

The example asset inventory and license manager 526 is provided tomanage inventory of components of the virtual server rack 206 and toensure that the different components of the virtual server rack 206 areused in compliance with licensing requirements. In the illustratedexample, the example asset inventory and license manager 526 alsocommunicates with licensing servers to ensure that the virtual serverrack 206 has up-to-date licenses in place for components of the virtualserver rack 206. The example logical object generation engine 528 isprovided to generate logical objects for different portions of thephysical hardware resources 224, 226 so that the logical objects can beused to provision logical resources based on the physical hardwareresources 224, 226. The example event process manager 530 is provided tomanage instances of different processes running in the virtual serverrack 206. The example VRM directory 532 is provided to track identitiesand availabilities of logical and physical resources in the virtualserver rack 206. The example extensibility tools 534 are provided tofacilitate extending capabilities of the virtual server rack 206 byadding additional components such as additional physical racks to formthe virtual server rack 206.

The example configuration component service 536 finds configurationcomponents for virtualizing the physical rack 202, 204 and obtainsconfiguration parameters that such configuration components need for thevirtualization process. The example configuration component service 536calls the configuration components with their correspondingconfiguration parameters and events. The example configuration componentservice 536 maps the configuration parameters to user interfaceproperties of the example configuration UI 540 for use by administratorsto manage the VRM 225, 227 through an example VRM portal 544. Theexample VRM portal 544 is a web-based interface that provides access toone or more of the components of the VRM 225, 227 to enable anadministrator to configure the VRM 225, 227.

The example VRM configuration component 538 implements configuratorcomponents that include configuration logic for configuringvirtualization components of the example virtualization layer 404 ofFIG. 4 .

The example VRM data store 542 is provided to store configurationinformation, provisioning information, resource allocation information,and/or any other information used by the VRM 225, 227 to manage hardwareconfigurations, logical configurations, workflows, services, etc. of thevirtual server rack 206.

Upon startup of the VRM 225, 227 of the illustrated example, the VRM225, 227 is reconfigured with new network settings. To reconfigure thenew network settings across backend components (e.g., the VMwarevCenter® virtual infrastructure server 410, the ESXi™ hypervisorcomponent 412, the VMware NSX® network virtualization platform 414, theVMware NSX® network virtualization manager 416, and the VMware vSAN™network data storage virtualization component 418 of FIG. 4 ), the VRM225, 227 serves the example configuration UI 540 to make configurationparameters accessible by an administrator. The VRM 225, 227 of theillustrated example allows a component to be plugged in and participatein IP address allocation/reallocation. For example, an IP reallocationservice may be accessible via the configuration UI 540 so that a usercan call the IP reallocation service upon plugging in a component. Theexample VRM 225, 227 logs status messages into the VRM data store 542,provides status updates to the configuration UI 540, and providesfailure messages to the configuration UI 540. The example VRM 225, 227allows components (e.g., the example VMware vCenter® virtualinfrastructure server 410 of FIG. 4 , the example ESXi™ hypervisorcomponent 412 of FIG. 4 , the example VMware NSX® network virtualizationplatform 414 of FIG. 4 , the example VMware NSX® network virtualizationmanager 416 of FIG. 4 , the example VMware vSAN™ network data storagevirtualization component 418 of FIG. 4 , and/or any other physicaland/or virtual components) to specify the number of IP addressesrequired, including zero if none are required. In addition, the exampleVRM 225, 227 allows components to specify their sequence number whichcan be used by the VRM 225, 227 during an IP reallocation process tocall the components to allocate IP addresses. The example VRM 225, 227also enables configuration sharing through common objects so thatcomponents can obtain new and old IP Addresses of other components. Theexample VRM 225, 227 stores IP addresses of the components in the VRMdata store 542.

In the illustrated example, the operations and management component 406is in communication with the VRM 225, 227 via the API interface 506 toprovide different services such as heat-map service, capacity plannerservice, maintenance planner service, events and operational viewservice, and virtual rack application workloads manager service. In theillustrated example, the network virtualization manager 304 and thevCenter server 510 are in communication with the VRM 225, 227 toinstantiate, manage, and communicate with virtual networks and virtualinfrastructures. For example, the network virtualization manager 304 ofthe illustrated example may be implemented using the VMware NSX® networkvirtualization manager 416 of FIG. 4 to virtualize network resourcessuch as physical hardware switches to provide software-based virtualnetworks. The example vCenter server 510 provides a centralized andextensible platform for managing virtual infrastructures. For example,the vCenter server 510 may be implemented using the VMware vCenter®virtual infrastructure server 410 of FIG. 4 to provide centralizedmanagement of virtual hosts and virtual machines from a single console.The vCenter server 510 of the illustrated example communicates with theVRM 225, 227 via the API interface 508 to provide administrators withviews of and access to configurations of the virtual server rack 206.

The vCenter server 510 of the illustrated example includes an exampleSingle Sign On (SSO) server 552 to enable administrators to accessand/or configure the VRM 225, 227. The example SSO server 552 may beimplemented using a web browser SSO profile of Security Assertion MarkupLanguage 2.0 (SAML 2.0). In the illustrated example, a SSO userinterface of the SSO server 552 is accessible through the example VRMportal 544. In this manner, the VRM 225, 227 is made accessible yetprotected using a SSO profile.

FIG. 6 depicts example hardware management application programinterfaces (APIs) 602 of the HMS 208, 214 of FIGS. 2-5 that are betweenthe example physical hardware resources 224, 226 of FIGS. 2-5 and theexample PRM 518. The example PRM 518 is a component of the VRM 225, 227(FIGS. 4 and 5 ) in the software stack of the virtual server rack 206(FIG. 2 ). An example PRM 518 is provided in each physical rack 202, 204and is configured to manage corresponding physical hardware resources224, 226 of the corresponding physical rack 202, 204 (FIG. 2 ) and tomaintain a software physical rack object for the corresponding physicalrack 202, 204. The example PRM 518 interfaces with the corresponding HMS208, 214 of the same physical rack 202, 204 to manage individualphysical hardware resources 224, 226. In some examples, the PRM 518 runsan HMS monitor thread (e.g., similar or part of the HMS monitor 428 ofFIG. 4 ) to monitor a management switch 207, 213 that runs the HMS 208,214 for Type 4 and Type 5 failures shown in Table 1 above. In someexamples, the HMS monitor thread in the PRM 518 also monitors for someType 3 failures shown in Table 1 above when an OS of the managementswitch 207, 213 needs external intervention.

In the illustrated example, the PRM 518 provides a set of LRM API's 606for use of the physical rack object (e.g., the generic pRACK object 624of FIG. 6 ) by the example LRM 520 (FIG. 5 ). The example LRM 520interacts with individual PRM 518 instances to employ physical resourcesbased on physical resource requirements of the LRM 520. In someexamples, the PRM 518 runs as part of an LRM application on a givenserver node in a virtual server rack 206. In the illustrated example,the LRM 520 is implemented using Java on Linux. However, any otherprogramming language and any other operating system may be used. The PRM518 of the illustrated example runs in an x86-based Linux VirtualMachine environment as part of the VRM 225, 227 on a designated servernode in the physical rack 202, 204.

In the illustrated example of FIG. 6 , the HMS 208, 214 publishes a setof generic HMS service APIs 610 for use by original equipmentmanufacturers (OEMs) to integrate hardware or software with the softwarestack of the virtual server rack 206. In the illustrated example, theintegration point for OEM components is the hardware management APIs602. In the illustrated example, vendor-specific plugin interfaces 614may be developed for use by the hardware management API 602 tofacilitate communications with physical hardware resources 224, 226 ofparticular vendors having vendor-specific interfaces. In the illustratedexample, such vendor-specific plugin interfaces 614 interface tocorresponding physical hardware resources 224, 226 using interfaceprotocols supported by the underlying hardware components (e.g., an IPMIAPI, a representational state transfer (REST) API, an extensible markuplanguage (XML) API, a hypertext transfer protocol (HTTP) API, a customerinformation model (CIM) API, etc.). In the illustrated example, thephysical hardware resources 224, 226 are shown as one or more physicalserver(s) 616, one or more physical switch(es) 618, and external storage620. The physical switches 618 of the illustrated example include themanagement switch 207, 213 and the ToR switches 210, 212, 216, 218 ofFIG. 2 .

In the illustrated example, the HMS 208, 214 provides the set of examplegeneric HMS service APIs 610 for use by the PRM 518 to access use ofvirtual resources based on the physical hardware resources 224, 226. Inthe illustrated example, the generic HMS service APIs 610 are notspecific to any particular vendor and/or hardware and are implementedusing a REST/JSON (JavaScript object notation) API protocol. However,any other API protocol may be used. The example generic HMS service APIs610 act on the underlying physical hardware resources 224, 226, whichare encapsulated in a set of software objects such as server objects632, switch objects 634, and storage objects 636. In the illustratedexample, the HMS 208, 214 maintains the server objects 632, the switchobjects 634, and the storage objects 636, and their associatedproperties. In the illustrated example, the HMS 208, 214 runs thegeneric HMS service APIs 610 on the example server host node(0) 209, 211(FIG. 2 ) to interface with the example PRM 518 and to an example HMSaggregator 611. The example HMS aggregator 611 runs on the exampleserver host node(0) 209, 211 to aggregate data from an example OOB agent612 and an example IB agent 613 to expose such data to the PRM 518 and,thus, the VRM 225, 227 (FIGS. 2, 4, and 5 ). In addition, the HMSaggregator 611 obtains data from the PRM 518 and parses the data out tocorresponding ones of the OOB agent 612 for communicating to thephysical hardware resources 224, 226, and to the IB agent 613 forcommunicating to software components. In the illustrated example, theOOB agent 612 runs on the management switch 207, 213, and the IB agent613 runs on the server host node(0) 209, 211. The example OOB agent 612interfaces with the physical resources 224, 226 and interfaces with theHMS aggregator 611. The example IB agent 613 interfaces with operatingsystems and interfaces with the HMS aggregator 611. That is, in theillustrated example, the OOB agent 612 is configured to communicate withvendor hardware via vendor-specific interfaces. The example IB agent 613is configured to communicate with OS-specific plugins and does notcommunicate directly with hardware. Instead, the IB agent 613communicates with operating systems to obtain information from hardwarewhen such information cannot be obtained by the OOB agent 612. Forexample, the OOB agent 612 may not be able to obtain all types ofhardware information (e.g., hard disk drive or solid state drivefirmware version). In such examples, the IB agent 613 can request suchhardware information from operating systems.

In examples disclosed herein, server and switch plugin APIs are to beimplemented by vendor-supplied plugins for vendor-specific hardware. Forexample, such server and switch plugin APIs are implemented using OOBinterfaces according to an HMS specification. For vendor-specific plugininterfaces 614 that do not support OOB communication based on thevendor-supplied plugin, the HMS 208, 214 implements an IB plugin 623 tocommunicate with the vendor's hardware via an operating system pluginusing IB communications. For example, the IB plugin 623 in the HMS 208,214 interfaces to the operating system running on the server node (e.g.,the server node implemented by the vendor's hardware) using anOS-provided mechanism such as OS APIs (e.g., vSphere APIs), OS commandline interfaces (CLI) (e.g., ESX CLI), and/or Distributed ManagementTask Force (DMTF) Common Information Model (CIM) providers.

The example HMS 208, 214 internally maintains the hardware managementAPI 602 to service API requests received at the generic HMS service APIs610. The hardware management API 602 of the illustrated example isvendor-specific and is implemented as a vendor-specific plugin to theHMS 208, 214. The hardware management API 602 includes example OOBplugins 621 to interface with vendor-specific plugin interfaces 614 tocommunicate with the actual physical hardware resources 224, 226. Forexample, the OOB plugin 621 interfaces with the example OOB agent 612 toexchange data between the generic HMS service APIs 610 and thevendor-specific plugin interface 614. Example vendor-specific interfaces614 may be proprietary to corresponding OEM vendors for hardwaremanagement. Regardless of whether the vendor-specific interfaces 614 areproprietary, or part of an industry standard or open interface, thepublished hardware management API 602 is configured to work seamlesslybetween the PRM 518 and the physical hardware resources 224, 226 tomanage the physical hardware resources 224, 226. To communicate with thephysical hardware resources 224, 226 via operating systems, the hardwaremanagement API 602 is provided with an example IB plugin 623. That is,in the illustrated example, the IB plugin 623 operates as an OS pluginfor the IB agent 613 to communicate with operating systems.

In the illustrated examples, the HMS 208, 214 uses the example OOB agent612 and the example OOB plugin 621 for OOB management of the physicalhardware resources 224, 226, and uses the example IB agent 613 and theexample IB plugin 623 for IB management of the physical hardwareresources 224, 226. In examples disclosed herein, OOB components such asthe OOB agent 612 and the OOB plugin 621 run in the management switch207, 213, and IB components such as the IB agent 613, the IB plugin 623,the generic HMS service APIs 610, and the HMS aggregator run 611 in theserver host node(0) 209, 211. Such separation of IB management and OOBmanagement components of the HMS 208, 214 facilitates increasedresiliency of HMS 208, 214 in case of failure of either of the IBmanagement channel or the OOB management channel. Such IB and OOBmanagement separation also simplifies the network configuration of theToR switches 210, 212, 216, 218 (FIGS. 2 and 3 ) and keeps themanagement network isolated for security purposes. In examples disclosedherein, a single generic API interface (e.g., a REST API, a JSON API,etc.) implementing the example generic HMS service APIs 610 is providedbetween the PRM 518 and the HMS 208, 214 to facilitate hiding allhardware and vendor specificities of hardware management in the HMS 208,214 and isolating the complexity of such hardware and vendorspecificities from upper layer processes in the PRM 518 and/or a LRM520.

In examples disclosed herein, the HMS 208, 214 uses an IPMI/DCMI (DataCenter Manageability Interface) for OOB management. Example OOBoperations performed by the HMS 208, 214 include discovery of newhardware, bootstrapping, remote power control, authentication, hardresetting of non-responsive hosts, monitoring catastrophic hardwarefailures, and firmware upgrades. In examples disclosed herein, anIntegrated BMC (baseboard management controller) Embedded local areanetwork (LAN) channel is used for OOB management of server hosts 616. Inexamples disclosed herein, one dedicated interface is enabled for OOBmanagement traffic. In such examples, the interface is enabled fordynamic host configuration protocol (DHCP) and connected to a managementswitch (e.g., the management switch 207, 213 running the HMS 208, 214).In examples disclosed herein, an administrative user is created tooperate the dedicated interface for OOB management traffic. An exampleHMS OOB thread uses IPMI commands to discover and manage server nodes616 over the dedicated interface for OOB management traffic. ExampleIPMI features that may be used over the Integrated BMC Embedded LAN forOOB management traffic include the following properties and sensors.

Properties

-   -   Device ID    -   Cold Reset    -   Get Self Test Results    -   Set/Get ACPI Power State    -   Set/Get User Name    -   Set/Get User Access    -   Set/Get User Password    -   Get Chassis Status    -   Chassis Control Power Down/Up/Power Cycle/Hard Reset    -   Chassis Identity    -   Set/Get System Boot Options    -   Get System Restart Cause    -   Set/Get LAN configuration    -   DHCP Host Name    -   Authentication Type Support    -   Authentication Type Enable    -   Primary RMCP Port Number    -   Default Gateway

Sensors

-   -   Power Unit Status    -   BMC Firmware Health    -   HDD status    -   Processor Status    -   Processor DIMM    -   Processor Temperature

The example HMS 208, 214 uses IB management to periodically monitorstatus and health of the physical resources 224, 226 and to keep serverobjects 632 and switch objects 634 up to date. In examples disclosedherein, the HMS 208, 214 uses Distributed Management Task Force (DMTF)Common Information Model (CIM) providers in a VMware ESXi™ hypervisorand CIM client for IB management. The CIM is the software framework usedfor managing hardware devices and services defined by the DMTF andsupported in the VMware ESXi™ hypervisor. CIM providers are classes thatreceive and fulfill client requests dispatched to them by a CIM objectmanager (CIMOM). For example, when an application requests dynamic datafrom the CIMOM, it uses the CIM provider interfaces to pass the requestto the CIM provider. Example IB operations performed by the HMS 208, 214include controlling power state, accessing temperature sensors,controlling BIOS (Basic Input/Output System) inventory of hardware(e.g., CPUs, memory, disks, etc.), event monitoring, and logging events.In examples disclosed herein, the main components which the HMS 208, 214monitors using IB management are I/O devices (e.g., Network InterfaceCards, PCI-e interfaces, and Disk Drives). In examples disclosed herein,the HMS 208, 214 uses CIM providers to monitor such I/O devices. ExampleCIM providers may be developed as VMware ESXi™ hypervisor userworlds tointerface with drivers corresponding to I/O devices being monitored togather data pertaining to those I/O devices. In some examples, the CIMproviders are C++ classes, which define sets of objects andcorresponding properties for use by the HMS 208, 214 to fetch data fromthe underlying physical resources 224, 226 (e.g., hardware I/O devices).

The PRM 518 of the illustrated example exposes a physical rack objectand its associated sub-objects in a generic vendor neutral manner to theexample LRM 520. Example sub-objects of the physical rack object includean example server object list 626 (e.g., a list of servers), an exampleswitch object list 628 (e.g., a list of switches), and a storage objectlist 630 (e.g., a list of external storage). The example PRM 518communicates with the example HMS 208, 214 using the example generic HMSservice APIs 610 to manage physical resources (e.g., hardware) in thephysical rack 202, 204, and to obtain information and inventory ofphysical resources available in the physical rack 202, 204. In theillustrated example, the HMS 208, 214 executes instructions from the PRM518 that are specific to underlying physical resources based on thehardware management APIs 602 of those physical resources. That is, afterthe HMS 208, 214 receives an instruction via a generic HMS service APIs610 from the PRM 518 that corresponds to an action on a particularphysical resource in the physical rack 202, 204, the HMS 208, 214 usesthe example hardware management APIs 602 to issue a correspondinginstruction to the particular physical resource using a hardwaremanagement API of that particular physical resource. In this manner, thePRM 518 need not be configured to communicate with numerous differentAPIs of different physical resources in the physical rack 202, 204.Instead, the PRM 518 is configured to communicate with the HMS 208, 214via the generic HMS service APIs 610, and the HMS 208, 214 handlescommunicating with numerous different, specific APIs of differentphysical resources through the example hardware management API 602. Byusing the generic HMS service APIs 610 for the PRM 518 to interface withand manage physical resources through the HMS 208, 214, the physicalracks 202, 204 may be configured or populated with hardware fromnumerous different manufacturers without needing to significantlyreconfigure the PRM 518. That is, even if such manufacturers require useof different APIs specific to their equipment, the HMS 208, 214 isconfigured to handle communications using such different APIs withoutchanging how the PRM 518 uses the generic HMS service APIs 610 tocommunicate with the physical resources via the HMS 208, 214. Thus, theseparation of the example generic HMS service APIs 610 from the examplehardware management API 602 allows the HMS 208, 214 to integrateseamlessly with hardware from ODMs, OEMs, and other vendorsindependently of the generic HMS service APIs 610 provided by the HMS208, 214 for use by the PRM 518 to manage such hardware.

The generic HMS service APIs 610 of the illustrated example supportsnumerous Get/Set events so that the HMS 208, 214 can support requestsfrom the PRM 518. Such Get/Set events will work on software server andswitch object properties. Example Get/Set events of the generic HMSservice APIs 610 include:

-   -   PRM_HMS_ACK_HANDSHAKE ( )    -   PRM_HMS_GET_RACK_INVENTORY (Server Obj[ ], Switch Obj[ ], . . .        )    -   PRM_HMS_GET_SERVER_OBJECT_PROP (Key, Value)    -   PRM_HMS_SET_SERVER_OBJECT_PROP (Key, Value)    -   PRM_HMS_GET_SWITCH_OBJECT_PROP (Key, Value)    -   PRM_HMS_SET_SWITCH_OBJECT_PROP (Key, Value)

In the above example Get/Set events of the generic HMS service APIs 610,the ‘Key’ is the property ID listed as part of the server/switch objectproperties. The example PRM_HMS_ACK_HANDSHAKE ( ) event API enables thePRM 518 to perform an acknowledgment-based handshake with the HMS 208,214 to establish a connection between the PRM 518 and the HMS 208, 214.The example PRM_HMS_GET_RACK_INVENTORY (Server Obj[ ], Switch Obj[ ], .. . ) API enables the PRM 518 to request the HMS 208, 214 to provide thehardware inventory of the physical rack 202, 204. The examplePRM_HMS_GET_SERVER_OBJECT_PROP (Key, Value) API enables the PRM 518 torequest a server object property from the HMS 208, 214. For example, thePRM 518 provides the ‘Key’ identifying the requested server objectproperty ID, and the HMS 208, 214 returns the ‘Value’ of the requestedserver object property. The example PRM_HMS_SET_SERVER_OBJECT_PROP (Key,Value) API enables the PRM 518 to set a server object property via theHMS 208, 214. For example, the PRM 518 provides the ‘Key’ identifyingthe target server object property ID, and provides the ‘Value’ to setfor the target server object property. The examplePRM_HMS_GET_SWITCH_OBJECT_PROP (Key, Value) API enables the PRM 518 torequest a switch object property from the HMS 208, 214. For example, thePRM 518 provides the ‘Key’ identifying the requested switch objectproperty ID, and the HMS 208, 214 returns the ‘Value’ of the requestedswitch object property. The example PRM_HMS_SET_SWITCH_OBJECT_PROP (Key,Value) API enables the PRM 518 to set a switch object property via theHMS 208, 214. For example, the PRM 518 provides the ‘Key’ identifyingthe target switch object property ID, and provides the ‘Value’ to setfor the target switch object property.

The PRM 518 of the illustrated example registers a set of callbacks withthe HMS 208, 214 that the PRM 518 is configured to use to receivecommunications from the HMS 208, 214. When the PRM callbacks areregistered, the HMS 208, 214 invokes the callbacks when eventscorresponding to those callbacks occur. Example PRM callback APIs thatmay be registered by the PRM 518 as part of the generic HMS service APIs610 include:

PRM Callback APIs

-   -   HMS_PRM_HOST_FAILURE (Server Obj[ ], REASON CODE)    -   HMS_PRM_SWITCH_FAILURE (Switch Obj[ ], REASON CODE)    -   HMS_PRM_MONITOR_SERVER_OBJECT (Key, Value, Update Frequency)    -   HMS_PRM_MONITOR_SWITCH_OBJECT (Key, Value, Update Frequency)

The example HMS_PRM_HOST_FAILURE (Server Obj[ ], REASON CODE) callbackenables the HMS 208, 214 to notify the PRM 518 of a failure of a host(e.g., a physical server) in the physical rack 202, 204. The exampleHMS_PRM_SWITCH_FAILURE (Switch Obj[ ], REASON CODE) callback enables theHMS 208, 214 to notify the PRM 518 of a failure of a switch of thephysical rack 202, 204. The example HMS_PRM_MONITOR_SERVER_OBJECT (Key,Value, Update Frequency) callback enables the HMS 208, 214 to sendmonitor updates to the PRM 518 about a server object. In the illustratedexample, ‘Key’ identifies the server object to which the updatecorresponds, ‘Value’ includes the updated information monitored by theHMS 208, 214 for the server object, and ‘Update Frequency’ indicates thefrequency with which the server object monitor update callbacks areprovided by the HMS 208, 214 to the PRM 518. The exampleHMS_PRM_MONITOR_SWITCH_OBJECT (Key, Value, Update Frequency) callbackenables the HMS 208, 214 to send monitor updates to the PRM 518 about aswitch object. In the illustrated example, ‘Key’ identifies the switchobject to which the update corresponds, ‘Value’ includes the updatedinformation monitored by the HMS 208, 214 for the switch object, and‘Update Frequency’ indicates the frequency with which the switch objectmonitor update callbacks are provided by the HMS 208, 214 to the PRM518.

The example generic HMS service APIs 610 provide non-maskable eventtypes for use by the HMS 208, 214 to notify the PRM 518 of failurescenarios in which the HMS 208, 214 cannot continue to function.

Non-Maskable Event HMS APIs

-   -   HMS_SOFTWARE_FAILURE (REASON CODE)    -   HMS_OUT_OF_RESOURCES (REASON CODE)

The example HMS_SOFTWARE_FAILURE (REASON CODE) non-maskable event APIenables the HMS 208, 214 to notify the PRM 518 of a software failure inthe HMS 208, 214. The example HMS_OUT_OF_RESOURCES (REASON CODE)non-maskable event API enables the HMS 208, 214 to notify the PRM 518when the HMS 208, 214 is out of physical resources.

The HMS 208, 214 provides the example hardware management APIs 602 foruse by the example generic HMS service APIs 610 so that the HMS 208, 214can communicate with the physical resources 224, 226 based oninstructions received from the PRM 518 via the generic HMS service APIs610. The hardware management APIs 602 of the illustrated exampleinterface with physical resource objects using their correspondingmanagement interfaces, some of which may be vendor-specific interfaces.For example, the HMS 208, 214 uses the hardware management APIs 602 tomaintain managed server, switch, and storage software object properties.Example hardware management APIs 602 for accessing server objects areshown below in Table 2.

TABLE 2 Server Hardware Management APIs API Return Value DescriptionDISCOVER_SERVER_INVENTORY( ) Node object Used to discover all A NodeObject identifies a server hardware node list servers in a rack. (NodeID, MAC Address, Management IP Address) Homogeneous hardware assumptionBoard information required for hardware identification to attach to theright plugin. GET_CHASSIS_SERIAL_NUMBER(NODE_OBJECT) Chassis Used to getchassis serial number identifier GET_BOARD_SERIAL_NUMBER (NODE_OBJECT)Board serial Used to get board number identifier GET_MANAGEMENT_MAC_ADDRMAC address Used to get MAC (NODE_OBJECT) address of management portSET_MANAGEMENT_IP_ADDR(NODE_OBJECT, RC (Success/ Used to set managementIPADDR) Error Code) IP address GET_CPU_POWER_STATE(NODE_OBJECT) CPU Usedto get current powerstate power state [S0-S5] of CPUSET_CPU_POWER_STATE(NODE_OBJECT, RC Used to set CPU power POWERSTATE)state SET_SERVER_POWER_STATE(ON/OFF/CYCLE/ RC Used to power on, RESET)power off, power cycle, reset a server Cold reset - BMC reset, run SelfTest Warm Reset - No Self Test GET_SERVER_CPU_PROPERTIES(NODE_OBJECT, RCUsed to get CPU CPU_OBJECT) specific informationSET_SERVER_CPU_PROPERTIES(NODE_OBJECT, RC Used to set CPU CPU_OBJECT)properties GET_SERVER_MEMORY_PROPERTIES(NODE_ RC Used to get memoryOBJECT, MEM_OBJECT) properties GET_SERVER_NETWORKCONTROLLER_ RC Used toget Network PROPERTIES (NODE_OBJECT, controller propertiesNETWORKCONTROLLER_OBJECT [ ]) including LOM, NICSSET_SERVER_NETWORKCONTROLLER_ RC Used to set NIC PROPERTIES properties(NODE_OBJECT, NETWORKCONTROLLER_OBJECT[ ])GET_SERVER_DISK_PROPERTIES(NODE_OBJECT, RC Used to get Disk DISK_OBJECT[]) properties SET_SERVER_DISK_PROPERTIES(NODE_OBJECT, RC Used to setDisk DISK_OBJECT[ ]) properties GET_SERVER_DISK_SMART_DATA(NODE_ RC Usedto get SMART OBJECT, SMART_OBJECT) data for disk SET_SERVER_SENSOR(NODE_OBJECT, SENSOR, RC Used to set sensors for VALUE, THRESHOLD)CPU/Memory/Power/ HDD GET_SENSOR_STATUS (NODE_OBJECT, RC Used to getsensor data SENSOR, VALUE, UNITS, THRESHOLD) GET_SYSTEM_EVENT_LOG_DATA(. . . ) Used to get System event log data UPDATE_CPU_FIRMWARE(FILE . . .) Update CPU firmware UPDATE_DISK_FIRMWARE(FILE . . . ) Update DiskFirmware UPDATE_NIC_FIRMWARE(FILE . . . ) Update NIC firmwareSET_CHASSIS_IDENTIFICATION LED/LCD/ (NODE_OBJECT, ON/OFF, NUMSECS) BEEPSET_BOOTOPTION(NODE_OBJECT, TYPE) RC Used to set bootoption SSD/PXEGET_BOOTOPTION(NODE_OBJECT) BOOT TYPE Used to get bootoptionSET_CREATE_USER (NODE_OBJECT, RC Used to create a USEROBJECT) managementuser

Example hardware management APIs 602 for accessing switch objects areshown below in Table 3.

TABLE 3 Switch Hardware Management APIs API Return Value DescriptionGET_CHASSIS_SERIAL_ID(NODE_OBJECT) CHASSIS_ Used to identify aIDENTIFIER TOR Switch chassis GET_MANAGEMENT_MAC(NODE_OBJECT)MAC_ADDRESS API to get Management port MAC addressSET_MANAGEMENT_IP(NODE_OBJECT, IP RC API to set ADDR) management IPaddress GET_SWITCH_INVENTORY(NODE_OBJECT) SWITCH_ Used to get switchINVENTORY hardware inventory (HW, Power supply, Fans, Transceiver etc.)SWITCH_REBOOT(NODE_OBJECT) RC Used to reboot the switchCREATE_SWITCH_USER(NODE_OBJECT, RC Used to create a USER_OBJECT)management user GET_SWITCH_VERSION(NODE_OBJECT) VERSION_ Used to getOBJECT Hardware and software version details GET_SWITCH_HW_PLATFORMHARDWARE_ Used to get the (NODE_OBJECT) CHIPSET_OBJECT switching ASICinformation APPLY_SWITCH_CONFIGURATION CONFIG_STATUS_ Used to apply(NODE_OBJECT, CONFIG_FILE) OBJECT running configuration on a switchDELETE_SWITCH_CONFIGURATION RC Used to delete (NODE_OBJECT) startupswitch configuration SET_LOG_LEVELS (NODE_OBJECT, RC Used to set logLOG_LEVEL) levels for alert, events and debug from the switchGET_SWITCH_ENVIRONMENT(NODE_OBJECT, RC Used to get POWER_OBJ,COOLING_OBJ, TEMPERATURE_ environmental OBJ) information from the switchfor power, fans and temperature. SET_LOCATOR_LED(NODE_OBJECT) RC Used toset locator LED of switch GET_INTERFACE_COUNTERS(NODE_OBJECT, RC Used tocollect INT_OBJECT) interface statisticsGET_INTERFACE_ERRORS(NODE_OBJECT, RC Used to collect INT_OBJECT) errorson switch interfaces GET_INTERFACE_STATUS(NODE_OBJECT, RC Used to getINT_OBJECT) interface status SET_INTERFACE_STAUS(NODE_OBJECT, RC Used toset INT_OBJECT) interface status GET_INTERFACE_PHY_STATUS(NODE_ RC Usedto get OBJECT, INT_OBJECT) physical status of interfaceGET_INTERFACE_SPEED(NODE_OBJECT, RC Used to get the INT_OBJECT”)speed/auto negotiation mode GET_VLAN_SUMMARY(NODE_OBJECT, RC Get VLANVLAN_OBJECT) information Number of VLAN in use and ports connected to.GET_VLAN_COUNTERS(NODE_OBJECT, RC Get VLAN VLAN_OBJECT) specificcounters GET_VXLAN_TABLE(NODE_OBJECT, RC VXLAN address VXLAN_TABLE)table GET_VXLAN_COUNTERS(NODE_OBJECT, RC VXLAN specific VXLAN_OBJECT)counters CLEAR_VLAN_COUNTERS RC Clear VLAN counters CLEAR_VXLAN_COUNTERSRC Clear VXLAN counters MONITOR_LINK_FLAPS(NODE_OBJECT, RC Monitor linkflaps INT_OBJECT) L3/MLAG/LAG STATUS SET_PORT_MTU(NODE_OBJECT, MTU) RCSet Port MTU SWITCH_OS_UPGRADE(FILE *) RC Ability to upgrade the OS onthe switch

In the illustrated example of FIG. 6 , the PRM 518 maintains an examplegeneric pRack object 624. The example generic pRack object 624 persistsa list of the physical resources 224, 226 returned by the HMS 208, 214and classified according to object types. The example generic pRackobject 624 includes the following pRack object definition.

pRACK Object

-   -   Rack ID (Logical Provided by VRM 225, 227)    -   Manufacturer ID ( )    -   Number Server Objects    -   Server Object List 626    -   Switch Object List 628    -   HMS heartbeat timestamp

In the pRack object definition above, the Rack ID is the logicalidentifier of the virtual server rack 206 (FIG. 2 ). The Manufacturer ID0 returns the identifier of the system integrator described inconnection with FIG. 1 that configured the virtual server rack 206. The‘Number Server Objects’ element stores the number of server objectsconfigured for the virtual server rack 206. The ‘Server Object List’ 626element stores a listing of server objects configured for the virtualserver rack 206. The ‘Switch Object List’ 628 element stores a listingof switch objects configured for the virtual server rack 206. The ‘HMSheartbeat timestamp’ element stores timestamps of when the operationalstatus (e.g., heartbeat) of the virtual server rack 206 is checkedduring periodic monitoring of the virtual server rack 206.

The example PRM 518 provides the LRM APIs 606 for use by the LRM 520(FIG. 5 ) to access the elements above of the pRack object 624. Inexamples disclosed herein, the PRM 518 and the LRM 520 run in the sameapplication. As such, the PRM 518 and the LRM 520 communicate with eachother using local inter-process communication (IPC). Examples of Get/Setevent APIs of the LRM APIs 606 include:

Get/Set Event LRM APIs

-   -   LRM_PRM_RECIEVE_HANDSHAKE_ACK ( )    -   LRM_PRM_GET_RACK_OBJECT (PRM_RACK_OBJECT [ ])    -   LRM_PRM_SET_SERVER_OBJECT_PROP (Key,Value)    -   LRM_PRM_GET_SERVER_STATS (Available, InUse, Faults)    -   LRM_PRM_SET_SERVER_CONFIG (SERVER_CONFIG_BUFFER)    -   LRM_PRM_SET_SWITCH_ADV_CONFIG (SWITCH_CONFIG_BUFFER)

In the Get/Set Event LRM APIs, the example LRM_PRM_RECIEVE_HANDSHAKE_ACK( ) API may be used by the LRM 520 to establish a connection between theLRM 520 and the PRM 518. The example LRM_PRM_GET_RACK_OBJECT(PRM_RACK_OBJECT [ ]) API may be used by the LRM 520 to obtain anidentifier of the rack object corresponding to the virtual server rack206. The example LRM_PRM_SET_SERVER_OBJECT_PROP (Key, Value) API may beused by the LRM 520 to set a server object property via the PRM 518. Forexample, the LRM 520 provides the ‘Key’ identifying the target serverobject property ID, and provides the ‘Value’ to set for the targetserver object property. The example LRM_PRM_GET_SERVER_STATS (Available,InUse, Faults) API may be used by the LRM 520 to request via the PRM 518operational status of servers of the physical resources 224, 226. Forexample, the PRM 518 may return an ‘Available’ value indicative of howmany servers in the physical resources 224, 226 are available, mayreturn an ‘InUse’ value indicative of how many servers in the physicalresources 224, 226 are in use, and may return a ‘Faults’ valueindicative of how many servers in the physical resources 224, 226 are ina fault condition. The example LRM_PRM_SET_SERVER_CONFIG(SERVER_CONFIG_BUFFER) API may be used by the LRM 520 to setconfiguration information in servers of the physical resources 224, 226.For example, the LRM 520 can pass a memory buffer region by reference inthe ‘SERVER CONFIG BUFFER’ parameter to indicate a portion of memorythat stores configuration information for a server. The exampleLRM_PRM_SET_SWITCH_ADV_CONFIG (SWITCH_CONFIG_BUFFER) may be used by theLRM 520 to set configuration information in switches of the physicalresources 224, 226. For example, the LRM 520 can pass a memory bufferregion by reference in the ‘SWITCH_CONFIG_BUFFER’ parameter to indicatea portion of memory that stores configuration information for a switch.

The LRM 520 of the illustrated example registers a set of callbacks withthe PRM 518 that the LRM 520 is configured to use to receivecommunications from the PRM 518. When the LRM callbacks are registered,the PRM 518 invokes the callbacks when events corresponding to thosecallbacks occur. Example callbacks that may be registered by the LRM 520include:

LRM Callback APIs

-   -   PRM_LRM_SERVER_DOWN (SERVER_ID, REASON_CODE)    -   PRM_LRM_SWITCH_PORT_DOWN (SERVER_ID, REASON_CODE)    -   PRM_LRM_SERVER_HARDWARE_FAULT (SERVER_ID, REASON_CODE)

The example PRM_LRM_SERVER_DOWN (SERVER_ID, REASON_CODE) callback APIenables the PRM 518 to notify the LRM 520 when a server is down. Theexample PRM_LRM_SWITCH_PORT_DOWN (SERVER_ID, REASON_CODE) callback APIenables the PRM 518 to notify the LRM 520 when a switch port is down.The example PRM_LRM_SERVER_HARDWARE_FAULT (SERVER_ID, REASON_CODE)callback API enables the PRM 518 to notify the PRM 518 to notify the LRM520 when a server hardware fault has occurred.

The example generic HMS service APIs 610 provide non-maskable eventtypes for use by the HMS 208, 214 to notify the PRM 518 of failurescenarios in which the HMS 208, 214 cannot continue to function.

Non-Maskable Event LRM APIs

-   -   PRM_SOFTWARE_FAILURE (REASON_CODE)    -   PRM_OUT_OF_RESOURCES (REASON_CODE)

The example PRM_SOFTWARE_FAILURE (REASON_CODE) non-maskable event APIenables the PRM 518 to notify the LRM 520 when a software failure hasoccurred. The example PRM_OUT_OF_RESOURCES (REASON_CODE) non-maskableevent API enables the PRM 518 to notify the LRM 520 when the PRM 518 isout of resources.

An example boot process of the virtual server rack 206 (FIGS. 2 and 4 )includes an HMS bootup sequence, a PRM bootup sequence, and an HMS-PRMinitial handshake. In an example HMS bootup sequence, when themanagement switch 207, 213 on which the HMS 208, 214 runs is powered-onand the OS of the management switch 207, 213 is up and running, abootstrap script to initialize the HMS 208, 214 is executed to fetch andinstall an HMS agent software installer on the management switch 207,213 to instantiate the HMS 208, 214. The HMS agent software installercompletes install and initialization of the HMS agent software bundleand starts the HMS agent daemon to instantiate the HMS 208, 214. Whenthe HMS agent daemon is started, the HMS 208, 214 determines theinventory of the physical resources 224, 226 of the physical rack 202,204. It does this by using an IPMI discover API which sends broadcastremote management control protocol (RMCP) pings to discover IPMI-capablenodes (e.g., nodes of the physical resources 224, 226) on a knowninternal subnet. In such examples, management IP addresses for servernodes (e.g., server nodes of the physical resources 224, 226) and ToRswitches (e.g., ToR switches 210, 212, 216, 218) will be known aprioriand published for the HMS 208, 214 to discover as internal DHCP addressranges. For example, the server hosts and the ToR switches 210, 212,216, 218 may be assigned IP addresses using a DHCP server running on thesame management switch 207, 213 that runs the HMS 208, 214.

In an example PRM bootup sequence, the PRM 518 boots up as part of theVRM 225, 227. The example VRM 225, 227 initiates the PRM 518 process.During bootup, the example PRM 518 creates an empty physical rack objectand waits for the HMS 208, 214 to initiate an HMS-PRM initial handshake.When the HMS-PRM initial handshake is successful, the example PRM 518queries the HMS 208, 214 for the physical inventory (e.g., the inventoryof the physical resources 224, 226) in the physical rack 202, 204. ThePRM 518 then populates the physical rack object based on the physicalinventory response from the HMS 208, 214. After the HMS-PRM initialhandshake with the HMS 208, 214 and after the physical rack objectinitialization is complete, the example PRM 518 sends a message to theLRM 520 to indicate that the PRM 518 is ready for accepting requests.However, if initialization does not succeed after a certain time period,the example PRM 518 notifies the LRM 520 that the pRack initializationhas failed.

In examples disclosed herein, the HMS 208, 214 initiates the HMS-PRMinitial handshake during the PRM bootup sequence to establish aconnection with the PRM 518. In examples disclosed herein, when the VMhosting the VRM 225, 227 is up and running the VM creates a virtual NICfor the internal network of the virtual server rack 206 and assigns anIP address to that virtual NIC of the internal network. The ToR switch210, 212, 216, 218 discovers how to reach and communicate with internalnetwork of the VRM 225, 227 when the VM hosting the VRM 225, 227 powerson. In examples disclosed herein, a management port of the managementswitch 207, 213 is connected to the ToR switches 210, 212, 216, 218. Themanagement port is used to manage the ToR switches 210, 212, 216, 218.In addition, the management switch 207, 213 is connected to the ToRswitches 210, 212, 216, 218 over data ports and communicate using aninternal VLAN network. The example VRM 225, 227 and the HMS 208, 214 canthen communicate based on a predefined IP address/port numbercombination. For example, the HMS 208, 214 initiates the HMS-PRM initialhandshake by sending a message to the predefined IP address/port numbercombination of the PRM 518, and the PRM 518 responds with an acknowledge(ACK) to the message from the HMS 208, 214 to complete the HMS-PRMinitial handshake.

After the HMS bootup sequence, the HMS 208, 214 performs an initialdiscovery process in which the HMS 208, 214 identifies servers,switches, and/or any other hardware in the physical resources 224, 226in the physical rack 202, 204. The HMS 208, 214 also identifies hardwareconfigurations and topology of the physical resources in the physicalrack 202, 204. To discover servers in the physical resources 224, 226,the example HMS 208, 214 uses IPMI-over-LAN, which uses theRMCP/RMCP+‘Remote Management Control Protocol’ defined by DMTF. Inexamples disclosed herein, RMCP uses port 623 as the primary RMCP portand 664 as a secure auxiliary port, which uses encrypted packets forsecure communications. The example HMS 208, 214 uses an RMCP broadcastrequest on a known subnet to discover IPMI LAN nodes. In addition, theHMS 208, 214 uses the RMCP presence ping message to determine IPMIcapable interfaces in the physical rack 202, 204. In this manner, byIPMI LAN nodes and IPMI capable interfaces, the HMS 208, 214 discoversservers present in the physical resources 224, 226.

To discover switches in the physical resources 224, 226, a DHCP serverrunning on the management switch 207, 213 assigns management IPaddresses to the ToR switches 210, 212, 216, 218. In this manner, theHMS 208, 214 can detect the presence of the ToR switches 210, 212, 216,218 in the physical rack 202, 204 based on the management IP addressesassigned by the DHCP server.

To maintain topology information of the management network in thevirtual server rack 206, a link layer discovery protocol (LLDP) isenabled on management ports of the discovered server nodes and ToRswitches 210, 212, 216, 218. The example management switch 207, 213monitors the LLDP packet data units (PDUs) received from all of thediscovered server nodes and keeps track of topology information. Theexample HMS 208, 214 uses the topology information to monitor for newservers that are provisioned in the physical resources 224, 226 and forde-provisioning of servers from the physical resources 224, 226. Theexample HMS 208, 214 also uses the topology information to monitorserver hosts of the physical resources 224, 226 for misconfigurations.

The example HMS 208, 214 is capable of power-cycling individualIPMI-capable server hosts in the physical resources 224, 226 of thephysical rack 202, 204. For example, the HMS 208, 214 sends SYS POWEROFF and SYS POWER ON messages to the BMCs on boards of target serverhosts via LAN controllers of target server hosts. The LAN controllersfor the management ports of server hosts are powered on using stand-bypower and remain operative when the virtual server rack 206 is powereddown. In some examples, the LAN controller is embedded to the system. Inother examples, the LAN controller is an add-in PCI card connected tothe BMC via a PCI management bus connection.

To hard reset a switch (e.g., the ToR switches 210, 212, 216, 218), theHMS 208, 214 uses IP-based access to power supplies of the physical rack202, 204. For example, the HMS 208, 214 can hard reset a switch when itis non-responsive such that an in-band power cycle is not possible viathe switch's CLI.

During a power cycle, OS images that are pre-stored (e.g., pre-flashed)in the servers and switches of the physical resources 224, 226 arebootstrapped by default. As part of the bootstrap procedure, the HMS208, 214 points the boot loader to the server or switch image located ona memory device (e.g., a flash memory, a magnetic memory, an opticalmemory, a Serial Advanced Technology Attachment (SATA) Disk-on-Module(DOM), etc.) and provides the boot loader with any additional parameterspertinent to the bootup of a booting server or switch. For instances inwhich a network-based boot is required, the HMS 208, 214 is capable ofaltering boot parameters to use PXE boot for servers and Trivial FileTransfer Protocol (TFTP)/Open Network Install Environment (ONIE) forswitches.

In examples disclosed herein, after the boot up process the HMS 208, 214validates that server nodes and the ToR switches 210, 212, 216, 218 havebeen properly bootstrapped with correct OS images and are ready to bedeclared functional. The example HMS 208, 214 does this by logging in tothe server hosts, validating the OS versions, and analyzing the logs ofthe server hosts for any failures during bootup. In examples disclosedherein, the HMS 208, 214 also runs basic operability/configuration testsas part of the validation routine. In some examples, the HMS 208, 214performs a more exhaustive validation to confirm that all loaded driversare compliant with a hardware compatibility list (HCL) provided by, forexample, the virtual system solutions provider 110 (FIG. 1 ). Theexample HMS 208, 214 also runs a switch validation routine as part of aswitch thread to verify that the boot configurations for the ToRswitches 210, 212, 216, 218 are applied. For example, the HMS 208, 214validates the OS versions in the ToR switches 210, 212, 216, 218 andtests ports by running link tests and ping tests to confirm that allports are functional. In some examples, the HMS 208, 214 performs moreexhaustive tests such as bandwidth availability tests, latency tests,etc.

An example definition of an example server object 632 for use inconnection with examples disclosed herein is shown below in Table 4. Theexample server object 632 defined in Table 4 encapsulates informationobtained both statically and dynamically using IB/CIM and OOB/IPMImechanisms. In examples disclosed herein, the static information isprimarily used for resource provisioning, and the dynamic information isused for monitoring status and health of hardware using upper layers inthe VRM 225, 227. In some examples, the PRM 518 does not store events oralarms. In such examples, the PRM 518 relays information pertinent toevents or alarms to the VRM 225, 227 and/or a Log Insight module (e.g.,a module that provides real-time log management for virtualenvironments).

TABLE 4 Example Definition of Server Object IPMI Device ID MAC addressof Management Port IP Address vRACK Server ID (P0, H0) [Physical Rack 0,Host 0] Hardware Model Power State On/Off CPU Vendor Frequency Cores HTErrors Memory Size Type Vendor ECC Cache size Status Errors Disk[x]Vendor Type Capacity Driver Status Errors NIC[x] Type 1G/10G/40GNumPorts Vendor Driver Linkstate TOR Port (P0, S0, X0)(Port numberconnected on the TOR switch) Status Errors Sensors Temperature PowerProvisioned Yes/No Boot State Yes/No OS Version Firmware Version BIOSVersion License HCL compliant Timestamps[ ] Lastboot Fault Domain Group

An example definition of an example switch object 634 for use inconnection with examples disclosed herein is shown below in Table 5. Theexample switch object 634 defined in Table 5 encapsulates both staticand dynamic information. In examples disclosed herein, the staticinformation is primarily used to make sure that network resources areavailable for a provisioned server host. Also in examples disclosedherein, the dynamic information is used to monitor health of theprovisioned physical network. Also in examples disclosed herein, aconfiguration information buffer is used for switch-specificconfigurations.

TABLE 5 Example Definition of Switch Object Chassis ID MAC Address ofManagement Port Management IP Address vRACK Switch ID (P0, S0) [PhysicalRack 0, Switch 0] Hardware Model Power State On/Off Provisioned Yes/NoBoot State Yes/No Switch Ports[X] Speed [1G/10G/40G/100G] Link State[Up/Down] Host Port [P0, H0, N1] [Port identifier of the host]Historical Stats[ ] In/Out Packets In/Out Drops OS Version FirmwareVersion Timestamps Lastboot Fault Domain Group Switch Configuration FileStatic [Vendor Type] (This is a vendor-specific configuration file. Thisproperty points to a text file name having a switch configuration. Thisis bundled as part of the HMS Application (e.g., used to run the HMS208, 214). The Static Switch Configuration File lists commands to beapplied and also files to be copied (e.g., pointers toconfiguration-specific files).) Switch Configuration File Dynamic[Vendor Type] (This is a vendor-specific configuration file. Thisproperty points to a text file name having a switch configuration. TheDynamic Switch Configuration File is downloaded at runtime from the PRM518 of the VRM 225, 227.)

In examples disclosed herein, example server properties managed by theHMS 208, 214 are shown in Table 6 below.

TABLE 6 Server Properties Table Property OOB IB Use Chassis SerialNumber Y Used to identify inventory Board Serial Number Y Same asabove - second level check Management Mac Y Chassis identifier on thenetwork Management IP Y Network Connectivity to management port PowerState [S0-S5] Y [Low Priority] Only if there is a power surge whileprovisioning we can set server low power states. Power ON/OFF/Power YAbility to power on Cycle/Reset and off servers CPU (Cores, Y Use asinput for Frequency) workload resource requirements Memory (Size, Speed,Y As above Status) NIC Partial Y As above (OOB can Speed get MACaddress) Link Status Firmware Version MAC Address PCI Device ID PCI SBFHW capabilities TSO, LRO, VXLAN offloads, CSUM DCB IPV6 CSUM DISKPartial Y As above (OOB has Size HDD status sensors Device described inSensors) Availability Status Vendor Model Type DeviceID Driver versionFirmware version SMART data for Disks N Resiliency algorithm(Self-Monitoring, input Analysis, and Reporting) Value/Threshold HealthStatus Media Wearout Indicator Write Error Count Read Error CountPower-on Hours Power Cycle Count Raw Read Error Rate Drive TemperatureDriver Rated Max Temperature Initial Bad Block Count SSD specificwearlevelling indicators CPU Firmware version Y Check for updatedversions CPU Firmware Y Ability to upgrade upgrade CPU firmware BIOSupgrade Y Ability to upgrade BIOS Sensors Y HW analytics/OAM(CPU/Memory/ Power/HDD) Processor Status (Thermal Trip - Used toidentify cause of server reset) CATERR processor DIMM Thermal Trip -Same as above Hang in POST failure - Processor Status in case ofunresponsive CPU HDD Status Firmware update status Power Unit Status(Power Down) BMC self test POST tests Y Used for HW Microcode validationupdate failed POST errors are Processor init logged to SEL fatal errorsDIMM major failures DIMM disabled DIMM SPD failure BIOS corrupted PCIePERR Parity errors PCIe resource conflict NVRAM corruptions ProcessorBIST failures BMC controller failed ME failure (Grizzly pass TechnicalProduct Specification Appendix E has all the POST errors) System EventLogs Y LogInsight/HW [SEL] Analytics DIMM Thermal Log events forcritical Margin critical hardware failures and threshold criticalthresholds Power Supply Status: Failure detected, Predictive failureProcessor Thermal Margin critical threshold NIC controller temperaturecritical threshold SAS module temperature critical threshold UserName/Password Y Create user for BMC access credentials for OOB accessNIC Firmware update N Y Firmware updates use the NIC drivers SSDfirmware update N Y SSD driver dependency

In examples disclosed herein, example switch properties managed by theHMS 208, 214 are shown in Table 7 below.

TABLE 7 Switch Properties Table Property Use Chassis Serial NumberIdentify Inventory Management Port MAC Network Identity of TORManagement Port IP address Provide Network Reachability to TOR PortProperties [Num Ports] Use as input for workload resource Admin Status,Link Status, Port requirements Type Port Statistics Calculate in-use andfree bandwidth and identify choke points using drop counters and bufferstatistics OS version Use for Upgrades

Further details of the example HMS 208, 214 of FIGS. 2, 3, 4, 5 , and/or6 are disclosed in U.S. patent application Ser. No. 14/788,004, filed onJun. 30, 2015, (Attorney Docket No. C150.02), and titled “METHODS ANDAPPARATUS TO CONFIGURE HARDWARE MANAGEMENT SYSTEMS FOR USE IN VIRTUALSERVER RACK DEPLOYMENTS FOR VIRTUAL COMPUTING ENVIRONMENTS,” which ishereby incorporated by herein reference in its entirety. Further detailsof the example VRMs 225, 227 of FIGS. 2, 4 , and/or 5 are also disclosedin U.S. patent application Ser. No. 14/796,803, filed on Jul. 10, 2015,and titled “Methods and Apparatus to Configure Virtual Resource Managersfor use in Virtual Server Rack Deployments for Virtual ComputingEnvironments,” which is hereby incorporated by reference herein in itsentirety. In addition, U.S. patent application Ser. No. 14/788,193,filed on Jun. 30, 2015, (Attorney Docket No. C150.03), and titled“METHODS AND APPARATUS TO RETIRE HOSTS IN VIRTUAL SERVER RACKDEPLOYMENTS FOR VIRTUAL COMPUTING ENVIRONMENTS,” and U.S. patentapplication Ser. No. 14/788,210, filed on Jun. 30, 2015, (AttorneyDocket No. C150.04), and titled “METHODS AND APPARATUS TO TRANSFERPHYSICAL HARDWARE RESOURCES BETWEEN VIRTUAL RACK DOMAINS IN AVIRTUALIZED SERVER RACK” are hereby incorporated by reference herein intheir entireties.

While an example manner of implementing the example VRM 225, 227 of FIG.2 is illustrated in FIGS. 2, 4 and 5 , one or more of the elements,processes and/or devices illustrated in FIGS. 2, 4 and/or 5 may becombined, divided, re-arranged, omitted, eliminated and/or implementedin any other way. Further, the example workflow services engine 514, theexample resource aggregation and correlations engine 516, the examplephysical resource manager 518, the example logical resource manager 520,the example broadcasting and election manager 522, the example securitymanager 524, the example asset inventory and license manager 526, theexample logical object generation engine 528, the example event processmanager 530, the example virtual rack manager directory 532, the exampleextensibility tools 534, the example configuration component services536, the VRM configuration component 538, the example configuration UI540, and/or, more generally, the example VRM 225, 227 of FIGS. 2, 4 ,and/or 5 may be implemented by hardware, software, firmware and/or anycombination of hardware, software and/or firmware. Thus, for example,any of the example workflow services engine 514, the example resourceaggregation and correlations engine 516, the example physical resourcemanager 518, the example logical resource manager 520, the examplebroadcasting and election manager 522, the example security manager 524,the example asset inventory and license manager 526, the example logicalobject generation engine 528, the example event process manager 530, theexample virtual rack manager directory 532, the example extensibilitytools 534, the example configuration component services 536, the VRMconfiguration component 538, the example configuration UI 540, and/or,more generally, the example VRM 225, 227 of FIGS. 2, 4 , and/or 5 couldbe implemented by one or more analog or digital circuit(s), logiccircuits, programmable processor(s), application specific integratedcircuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or fieldprogrammable logic device(s) (FPLD(s)). When reading any of theapparatus or system claims of this patent to cover a purely softwareand/or firmware implementation, at least one of the example workflowservices engine 514, the example resource aggregation and correlationsengine 516, the example physical resource manager 518, the examplelogical resource manager 520, the example broadcasting and electionmanager 522, the example security manager 524, the example assetinventory and license manager 526, the example logical object generationengine 528, the example event process manager 530, the example virtualrack manager directory 532, the example extensibility tools 534, theexample configuration component services 536, the VRM configurationcomponent 538, the example configuration UI 540, and/or, more generally,the example VRM 225, 227 of FIGS. 2, 4 , and/or 5 is/are herebyexpressly defined to include a tangible computer readable storage deviceor storage disk such as a memory, a digital versatile disk (DVD), acompact disk (CD), a Blu-ray disk, etc. storing the software and/orfirmware. Further still, the example VRM 225, 227 of FIGS. 2, 4 , and/or5 may include one or more elements, processes and/or devices in additionto, or instead of, those illustrated in FIGS. 4 and/or 5 , and/or mayinclude more than one of any or all of the illustrated elements,processes and devices.

FIG. 7 depicts the example virtual server rack 206 of FIG. 2 withaggregate capacity across physical racks. In the illustrated example,the operations and management component 406 operates across the physicalracks of the virtual server rack 206 to configure and manage numerousVMware vCenters (e.g., vCenter₁, vCenter₂, vCentern) and numerous ESXClusters (e.g., ESX Cluster₁, ESX, Cluster₂, ESX Clustern). In theillustrated example, the VMware vCenters are server managers toconfigure and manage workload domains in virtual infrastructures. In theillustrated, the ESX Clusters are a collection of ESX server hosts andassociated virtual machines with shared resources and a sharedmanagement interface. The ESX Clusters are managed by the vCenter servermanagers. Each workload domain contains one vCenter server and can hostone or more ESX Clusters.

The example the operations and management component 406 treats multiplephysical racks as a single pool of hardware in the virtual server rack206. In this manner, the customer does not need to know where serversare physically located. When a new physical rack is added to the virtualserver rack 206, the capacity of the newly added physical rack is addedto the overall pool of hardware of the virtual server rack 206.Provisioning of that capacity is handled via Workload Domains.

FIG. 8 depicts example management clusters MGMT1 802 and MGMT2 804 incorresponding ones of the example physical racks 202, 204 of FIG. 2 . Inthe illustrated example, the management clusters are per-rack, which isfacilitated by the physical racks 202, 204 being built substantiallysimilar or identical. One or more workload domains can be run on eachmanagement cluster 802, 804. In the illustrated example, management VMsare encapsulated in each corresponding management cluster 802, 804 andisolated from management VMs in other management clusters. Themanagement clusters 802, 804 may include one or more components 806 suchas, for example, a VMware NSX® network virtualization platform.

FIG. 9 depicts two example workload domains 902, 904 executing on thevirtual server rack 206 of FIGS. 2 and 7 . The example workload domains902, 904 are used to provision capacity based on user inputs thatspecify one or more of domain type, security, availability requirements,performance requirements, and capacity requirements. Based on these userinputs, the operations and management component 406 determines whether adeployment is possible. If a deployment is possible, the operations andmanagement component 406 determines an optimal host set that meets theuser-specified requirements. The output of the operations and managementcomponent 406 is a fully configured system with suitable managementcomponents, capacity, and settings that meet the user-specifiedrequirements.

In the illustrated example, the workload domains 902, 904 use apolicy-driven approach to capacity deployment. The policy for eachworkload domain 902, 904 can be specified and changed by a user (e.g.,customer). Each of the example workload domains 902, 904 is an atomicunit for deployment, upgrading, and deletion. In the illustratedexample, the workload domains 902, 904 are provided with algorithms thatdetermine optimal host placement in the virtual server rack 206 to meetthe user provided requirements. The management components for each ofthe workload domains 902, 904 of the illustrated example will run on oneof the management clusters. Each management cluster can run on a singlephysical rack or across multiple physical racks as shown in FIG. 7depending on availability and capacity requirements.

In the illustrated examples disclosed herein, domain types include aninfrastructure as a service (IaaS) domain type, a platform as a service(PaaS) domain type, a desktop as a service (DaaS)/virtual desktopinfrastructure (VDI) domain type, a development/test domain type, aproduction domain type, a Cloud Native domain type, an Openstack domaintype, and a Big Data domain type. However, any other domain type may beused. In the illustrated example, security types include firewallsettings, security group settings, particular specified IP addresses,and/or other network security features. In the illustrated example,availability requirements refer to durations of continuous operationexpected for a workload domain. Example availability requirements alsorefer to configuring workload domains so that one workload's operability(e.g., malfunction, unexpected adverse behavior, or failure) does notaffect the availability of another workload in the same workload domain.In the illustrated example, performance requirements refer to storageconfiguration (e.g., in terms of megabytes (MB), GB, terabytes (TB),etc.), CPU operating speeds (e.g., in terms of megahertz (MGz), GHz,etc.), and power efficiency settings. Example performance requirementsalso refer to configuring workload domains so that concurrent workloadsin the same workload domain do not interfere with one another. Suchnon-interference between concurrent workloads may be a default featureor may be user-specified to different levels of non-interference. In theillustrated example, capacity requirements refer to the number ofresources required to provide availability, security, and/or performancerequirements specified by a user. Allocating capacity into workloaddomains in accordance with the teachings of this disclosure enablesproviding workload domains with isolation from other workload domains interms of security, performance, and availability. That is, security,performance, and availability for one workload domain can be madedistinct separate from security, performance, and availability fromother workload domains. For example, techniques disclosed herein enableplacing a workload domain on a single physical rack separate from otherworkload domains in other physical racks such that a workload domain canbe physically isolated from other workload domains in addition to beinglogically isolated. Additionally, techniques disclosed herein facilitateplacing a workload domain across numerous physical racks so thatavailability requirements of the workload domain are met even when onephysical rack fails (e.g., if one physical rack fails, resourcesallocated to the workload domain from one or more other physical rackscan ensure the availability of the workload domain).

An example of the operations and management component 406 of FIGS. 4, 5,7, and 9 is illustrated in FIG. 10 . The example operations andmanagement component 406 includes an example policy manager 1002, anexample policy enforcer 1004, an example deployment manager 1006, anexample policy database 1008, an example resource manager 1010, and anexample resource database 1012. In the illustrated example of FIG. 10 ,the policy manager 1002, the policy enforcer 1004, the deploymentmanager 1006, the policy database 1008, the resource manager 1010, andthe resource database 1012 are all in communication with one another viaa bus 1014. As disclosed herein, the example operations and managementcomponent 406 determines placement solutions for workload domains,manages the addition and/or removal of capacity according to policies,and deploys workload domains based on user-selected availability,performance, and capacity options. The example operations and managementcomponent 406 operates on a number of user requests concurrently todetermine a number of placement solutions concurrently within a finitepool of shared configuration resources. Accordingly, the exampleoperations and management component 406 services then number of userrequests in a more timely fashion than achievable without the disclosedtechniques. For example, the operations and management component 406identifies first ones of a plurality of computing resources to form afirst placement solution for a first workload domain based onavailability, performance, and capacity options selected by a firstuser, and concurrently identifies second ones of the plurality ofcomputing resources different from the first ones of the plurality ofcomputing resources to form a second placement solution for a secondworkload domain based on availability, performance, and capacity optionsselected by a second user.

The example policy manager 1002 determines availability options,performance options, and/or capacity options for a workload domain. Insome examples, the policy manager 1002 creates, update, or deletes oneor more policies based on the availability options, performance options,and/or capacity options selected by a user. The example policy manager1002 may communicate with a user interface to present options to a userand receive selections of such options from the user. In some examples,the policy manager 1002 determines availability options and performanceoptions for a workload domain based on a user-selected workload domaintype. As disclosed herein, a user may select domain types such as, forexample, an IaaS domain type, a PaaS domain type, a DaaS/VDI domaintype, a development/test domain type, a production domain type, a CloudNative domain type, an Openstack domain type, a Big Data domain type,etc. In some examples, different domain types may be associated with oneor more predetermined availability and/or performance options. Forexample, the policy manager 1002 may access a look-up-table for defaultavailability and/or performance options associated with the domain typesdescribed above. The example policy manager 1002 presents one or moreavailability and/or performance options to a user for selection thereof.In some examples, the policy manager 1002 presents the availabilityand/or performance options to a user at a low level of detail (e.g., lowredundancy, normal redundancy, high redundancy 1, high redundancy 2, lowperformance, normal performance, high performance, etc.), such that theuser need not understand the physical resources required to provide suchavailability and/or performance. In some examples, the policy manager1002 presents the availability and/or performance options at a highlevel of detail (e.g., sliding scales representative of a number ofredundant resources, CPU operating speeds, memory, storage, etc.).

Based on the user-selected availability option(s) and/or performanceoption(s), the example policy manager 1002 determines one or morecapacity option(s) capable of providing the user-selected availabilityoption(s) and/or performance option(s). For example, the policy manager1002 determines the number of resources required provide theuser-selected availability option(s) and/or performance option(s). Insome examples, the policy manager 1002 determines and presents a numberof capacity options to the user (e.g., four host resources could providethe user-selected availability option(s) and/performance option(s), butfive resources would be better). In some examples, the policy manager1002 determines and presents one capacity option to the user. In someexamples, the policy manager 1002 determines no capacity options areavailable to the user based on the selected availability option(s)and/or performance option(s). In such examples, the policy manager 1002presents to the user that there are no capacity options. In some suchexamples, the policy manager 1002 provides recommendations to a user foradjusting the availability option(s) and/or performance option(s) tomake one or more capacity options available. In some such examples,multiple workload domains share a finite pool of computation resourcessuch that capacity options may become unavailable due to a lack ofresources. However, as disclosed herein, resources are allocated todifferent workload domains and/or de-allocated from workload domainssuch that capacity options may become available for the user-selectedavailability option(s) and/or performance option(s) at a later time. Insome examples, portions of the shared pool of configurable computingresources are reserved to provide failure tolerance. In some examples,such reserved computing resources may be used when the policy manager1002 determines that no non-reserved capacity options are available tothe user based on the selected availability option(s) and/or performanceoption(s).

In some examples, a user wishes to create, update, delete, or otherwisemodify the one or more policies created by the policy manager 1002 basedon the availability, performance, and/or capacity options. For example,a user wants to increase capacity after a workload domain has beendeployed. In such examples, the policy manager 1002 defines, updates,deletes, or otherwise modifies the one or more policies based oninstructions received from the user (e.g., through the user interface).The policy manager 1002 stores information relating to the one or morepolices in association with corresponding workload domains within thepolicy database 1008.

The example policy enforcer 1004 monitors the capacity of workloaddomains and compares the capacity of the workload domains tocorresponding capacity policies (e.g., stored in the policy database1008) to determine whether the capacity of the workload domain 902 is incompliance with a policy capacity specified in the user-defined policyfor the workload domain 902. For example, if the workload domain 902 isassociated with a user-defined policy having a first policy capacity andthe workload domain 902 has a capacity different from the first policycapacity, the example policy enforcer 1004 determines that the workloaddomain 902 is in violation of the user-defined policy. In some examples,the workload domain 902 is in violation for having a capacity thatexceeds the policy capacity specified in the user-defined policy (e.g.,the policy capacity specified in the user-defined policy was lowered bythe user). In some examples, the workload domain 902 is in violation forhaving a capacity less than the policy capacity specified in theuser-defined policy (e.g., the policy capacity specified in theuser-defined policy was increased by the user). In some examples, suchviolations occur due to modifications to user-defined policies after aworkload domain has been deployed (e.g., in response to the policymanager 1002 defining, updating, deleting, or otherwise modifying theuser-defined policy). Additionally or alternatively, compliance with apolicy capacity may include the capacity of the workload domain 902satisfying an acceptable capacity range (e.g., within +/−5%). Forexample, if the policy capacity specified in the user-defined policy isone-hundred and the capacity of the workload domain 902 is ninety-nine,the capacity of the workload domain 902 may still be in compliance eventhough ninety-nine is less than one-hundred (e.g., 99 is within 5% of100). Accordingly, non-compliance with a policy capacity may include thecapacity of the workload domain 902 not satisfying the acceptablecapacity range (e.g., outside of +/−5%).

In some examples, the example policy enforcer 1004 categorizes existingworkload domains based on a type of update to user defined policies. Forexample, the example policy enforcer 1004 may group together workloaddomains having updates reflecting a request for additional or a requestto release excess CPU capacity, storage capacity, memory capacity, etc.In such examples, the example policy enforcer 1004 determines whetherthere is a second workload domain within a same category as the firstworkload domain that has excess capacity and/or is requesting additionalcapacity.

The example deployment manager 1006 determines placement solutions forworkload domains within the shared pool of configurable computingresources. The example deployment manager 1006 determines what resourcesto allocate for workload domains based on the availability, performance,and capacity options selected by users. In some examples, the deploymentmanager 1006 determines one or more placement solutions for one or moreworkload domains (e.g., from one or more users) concurrently,simultaneously, or substantially simultaneously. In such examples, thedeployment manager 1006 communicates with the resource manager 1010 torequest/receive a most recent list of accessible resources from theshared pool of configurable computing resource prior to determining aplacement solution. In some examples, the deployment manager 1006requests the most recent list of resources to prevent allocatingresources that have been allocated to another workload domain (e.g., afirst workload domain is to have a first set of resources and a secondworkload domain is to have a second set of resources different from thefirst set of resources). Various placement solutions may be usedincluding, selecting the least number of resources required to satisfythe capacity policy, selecting one more than the least number ofresources required to satisfy the capacity policy, etc.

Once the deployment manager 1006 has a most recent list of accessibleresources, the deployment manager 1006 determines a placement solutionfor a workload domain using the most recent list of accessible resourcesbased on the availability, performance, and/or capacity options selectedby a user. For example, if a user selects a multi-rack option, thedeployment manager 1006 determines a placement solution in a virtualserver rack across a plurality of physical racks (e.g., allocateresources across five different racks). In such examples, the deploymentmanager 1006 may allocate one resource per rack. Alternatively, thedeployment manager 1006 may allocate all the resources of a first rackbefore moving to the next rack. In some examples, if a user selects asingle-rack option, the deployment manager 1006 determines a verticalplacement solution in a single physical rack (e.g., fill a single rackwith one or more placement solutions).

In some examples, the deployment manager 1006 is to when ones of thecapacities of the plurality of workload domains are less than the policycapacities of the respective user-defined policies, concurrentlydetermine a plurality of placement solutions for additional capacity forthe plurality of workload domains based on a comparative analysis of:(a) the capacities of the plurality of workload domains, (b) updates tothe respective user-defined policies, and (c) a resource database sharedby the multiple users, the resource manager to allocate resources to theplurality of workload domains based on the plurality of placementsolutions.

Examples for configuring and deploying workload domains, as disclosedherein, are shown in Table 8 below.

TABLE 8 Workload Domain Options Feature Description WRK01.02 Storage Thenumber of servers selected to fulfill a requested capacity takescapacity calculation into account VSAN Failures-to-Tolerate. reflectsusable storage Usable space in Summary page is the amount of usablespace after after Failure-to-Tolerate taking FTT in to account. (FTT)FTT Overhead: 1. For FTT = 0 usable space is 100% of host capacity 2.For FTT = 1 usable space is 50% of host capacity 3. For FTT = 2 usablespace is 33% of host capacity Acceptance Criteria: If the selected hostcapacity is X, user will see the following values on the UI for usablespace: FTT = 0 => X/(1 + FTT) = X FTT = 1 => X/(1 + FTT) = X/2 FTT = 2=> X/(1 + FTT) = X/3 WRK01.07 Placement This algorithm places clustersvertically in racks. It does not allow Algorithm - Vertical - clustersto span racks. Single Rack Servers need not be physically sequential aslong as they are in the same rack. Acceptance Criteria: The followingcases are used: Validate that cluster hosts are selected based on firstavailable by rack order. Validate error state when there areinsufficient hosts to meet capacity requirements. wrk01.07 PlacementThis placement algorithm creates clusters by filling racks verticallyAlgorithm - Fill Racks - first before moving to the next rack. SingleRack Top to bottom and left to right starting with rack 1. Filling racksoptimizes the capacity remaining in other racks capacity for verticalplacement. Acceptance Criteria: The following cases are used: Validatethat algorithm selects first hosts available in rack1 and fails ifenough hosts are not available. WRK01.02 Feedback to Deploy WorkloadDomain workflow provides feedback to user if user when no placementthere is no placement solution for the parameters they have solution isfound requested. Example: ″Insufficient Memory Capacity Available,please reduce memory requirements″ ″Insufficient Storage Available toprovide requested Capacity and Availability″ Acceptance Criteria: Forfill rack if the required resources are not available its displayed onthe UI with the user friendly message For vertical rack if the requiredresources are not available its displayed on the UI with the userfriendly message WRK01.04.01 Workload Rack Striping: Yes for rackcount >1 Domain Availability VSAN FTT = 0 option - Low VSAN FaultDomains = No Redundancy vSphere HA = No Max Size: Max configured clustersize. Placement: Fill racks Success Criteria: ============= VSAN defaultpolicy includes the required parameters Cluster feature HA is notenabled. WRK01.04.01 Workload Option 1 - Single or multi-rack DomainAvailability Rack Striping: No option - High VSAN Fault Domains: NoRedundancy Option 1 vSphere HA = % of cluster per HA guidelines VSAN FTT= 2 (Requires 5 hosts minimum) Max Size: Max hosts available in a singlerack (max 22 in current design with no reserve capacity) Placement:Vertical Success Criteria: ============= VSAN default policy includesthe required parameters WRK01.03.01 VSAN Disk Stripes = 1 PerformanceOptions - Host Power Management Active Policy = Low DevelopmentWorkloads Success Criteria: ============= VSAN default policy includesthe required parameters WRK01 IaaS Workload An IaaS Workload Domain mapsto one or more vCenter Servers Domain Rules and vSphere clusters. Itincludes Capacity, Availability, and Performance policies applied tothose clusters. A vCenter Server may manage only one workload domain,but this may include more than one vSphere Cluster as part of thatdomain. An IaaS Workload Domain may also include additional managementcomponents such as vRA or VIO. Acceptance Criteria: Inventory datavalidation and flexibility wrk01.07 Placement This placement algorithmcreates clusters by filling racks vertically Algorithm - Fill Racks -first before moving to the next rack. Multi Rack - Validation Top tobottom and left to right starting with rack 1. Filling racks optimizesthe capacity remaining in other racks capacity for vertical placement.Acceptance Criteria: The following cases are used: If all the hosts arefree on 2 rack setup, the first N nodes are selected from Rack 1. If theRack 1 is full the other hosts are selected from Rack 2. If few hostsare consumed on both Rack 1 and Rack 2, the selection algorithm stillchooses the free hosts from Rack 1. If more hosts are required it alsochooses the hosts from Rack 2. WRK01.07 Placement This algorithm placesclusters vertically in racks. It does not allow Algorithm - Vertical -clusters to span racks. Multi-Rack - Validation Servers need not bephysically sequential as long as they are in the same rack. AcceptanceCriteria: The following cases are used: If all the hosts are free on 2rack setup, the first N nodes are selected from Rack 1 to fulfill therequested capacity. If the Rack 1 is not able to fulfill the requestedcapacity, the hosts are selected from Rack 2 if available. If few hostsare consumed on both Rack 1 and Rack 2, the selection algorithm stillchooses the free hosts from Rack 1 to fulfill the requested capacity. Ifthe Rack 1 is not able to fulfill the requested capacity, the hosts areselected from Rack 2 if available. WRK01.04.01 Workload Option 2multi-rack only, five racks minimum Domain Availability Rack Striping:Yes option - High VSAN Fault Domains: Yes, strict. Will require customerto add Redundancy Option 2 additional hosts in some cases. vSphere HA =% of cluster per HA Guidelines Placement: Minimal Striping Compute oruse table VSAN FTT = 2 (Requires 5 FD minimum) Max Size: Max configuredcluster size. WRK01.07 Placement Minimal Striping algorithm stripesacross the minimum number of Algorithm - Minimal racks required to meetVSAN Fault Domain requirements. Striping The number of hosts in a faultdomain should be even, meaning that the number of hosts in the clustermust be evenly divisible by the number of racks to stripe across. Theinteger result of this is the number of hosts in the Fault Domain. Thiscan be computed during placement or a lookup table can be used.

The example deployment manager 1006 communicates with the exampleresource manager 1010 to reserve the resources associated with theplacement solution. After the resources are reserved, the exampledeployment manager 1006 deploys the workload domain with the reservedresources based on the user-selected availability, performance, and/orcapacity options.

The example policy database 1008 stores information relating touser-selected options for deploying a workload domain. For example, whena user selects an availability option, a performance option, and/or acapacity option, the policy manager 1002 may store this information in auser-defined policy corresponding to the workload domain. Additionally,the policy manager 1002 updates user-defined policies with the examplepolicy database 1008 based on subsequent user-selections. Such workloaddomain and user-defined policy pairing may be stored in one or morelook-up tables within the example policy database 1008. In someexamples, the example policy database 1008 is a tangible computerreadable storage device or storage disk such as a memory, a digitalversatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc.

The example resource manager 1010 reserves resources from the sharedpool of configurable computing resources based on placement solutionsdetermined by the deployment manager 1006. In some examples, theresource manager 1010 allocates resources to and/or de-allocatesresources from workload domains. In some examples, the resource manager1010 allocates and/or de-allocates resources between workload domains.In some such examples, the resource manager 1010 determines whether oneor more workload domains can provide resource capacity requested byanother workload domain and/or whether one workload domain can provideresource capacity requested by one or more workload domains. The exampleresource manager 1010 tracks the reservation, allocation, and/orde-allocation of resources by storing information associated with suchreservation, allocation, and/or de-allocation of resources in theexample resource database 1012. In some examples, the resource manager1010 communicates with one of the VRMs 225, 227 (FIG. 2 ), whichcommunicates with the HMS 208, 214 to manage the physical hardwareresources 224, 226.

The example resource database 1012 stores information regarding thestatus of the shared pool of configurable resources such as for example,resources allocated from the shared pool of configurable resources toworkload domains and/or resources de-allocated from workload domains tothe shared pool of configurable resources. The example deploymentmanager 1006 reads such status information for a most recent list ofavailable resources prior to determining a placement solution. In someexamples, the example resource database 1012 is a tangible computerreadable storage device or storage disk such as a memory, a digitalversatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc.

While an example manner of implementing the example operations andmanagement component 406 of FIGS. 4, 5, 7 and/or 9 is illustrated inFIG. 10 , one or more of the elements, processes and/or devicesillustrated in FIG. 10 may be combined, divided, re-arranged, omitted,eliminated and/or implemented in any other way. Further, the examplepolicy manager 1002, the example policy enforcer 1004, the exampledeployment manager 1006, the example policy database 1008, the exampleresource manager 1010, the example resource database 1012, and/or, moregenerally, the example operations and management component 406 of FIGS.4, 5, 7 and/or 9 may be implemented by hardware, software, firmwareand/or any combination of hardware, software and/or firmware. Thus, forexample, any of the example policy manager 1002, the example policyenforcer 1004, the example deployment manager 1006, the example policydatabase 1008, the example resource manager 1010, the example resourcedatabase 1012, and/or, more generally, the example operations andmanagement component 406 of FIGS. 4, 5, 7 and/or 9 could be implementedby one or more analog or digital circuit(s), logic circuits,programmable processor(s), application specific integrated circuit(s)(ASIC(s)), programmable logic device(s) (PLD(s)) and/or fieldprogrammable logic device(s) (FPLD(s)). When reading any of theapparatus or system claims of this patent to cover a purely softwareand/or firmware implementation, at least one of the example policymanager 1002, the example policy enforcer 1004, the example deploymentmanager 1006, the example policy database 1008, the example resourcemanager 1010, the example resource database 1012, and/or, moregenerally, the example operations and management component 406 of FIGS.4, 5, 7 and/or 9 is/are hereby expressly defined to include a tangiblecomputer readable storage device or storage disk such as a memory, adigital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc.storing the software and/or firmware. Further still, the exampleoperations and management component 406 of FIGS. 4, 5, 7 and/or 9 mayinclude one or more elements, processes and/or devices in addition to,or instead of, those illustrated in FIG. 10 , and/or may include morethan one of any or all of the illustrated elements, processes anddevices.

Flowcharts representative of example machine readable instructions thatmay be executed to deploy the example workload domains 902, 904 of FIG.9 are shown in FIGS. 11A and 11B, and flowcharts representative ofexample machine readable instructions that may be executed to update theexample workload domains 902, 904 of FIG. 9 are shown in FIGS. 12A and12B. In these examples, the machine readable instructions implementprograms for execution by a processor such as the processor 1812 shownin the example processor platform 1800 discussed below in connectionwith FIG. 18 . The programs may be embodied in software stored on atangible computer readable storage medium such as a CD-ROM, a floppydisk, a hard drive, a digital versatile disk (DVD), a Blu-ray disk, or amemory associated with the processor 1812, but the entire program and/orparts thereof could alternatively be executed by a device other than theprocessor 1812 and/or embodied in firmware or dedicated hardware.Further, although the example programs are described with reference tothe flowcharts illustrated in FIGS. 11A, 11B, 12A, and 12B, many othermethods of deploying, managing, and updating workload domains inaccordance with the teachings of this disclosure may alternatively beused. For example, the order of execution of the blocks may be changed,and/or some of the blocks described may be changed, eliminated, orcombined.

As mentioned above, the example processes of FIGS. 11A, 11B, 12A, and12B may be implemented using coded instructions (e.g., computer and/ormachine readable instructions) stored on a tangible computer readablestorage medium such as a hard disk drive, a flash memory, a read-onlymemory (ROM), a compact disk (CD), a digital versatile disk (DVD), acache, a random-access memory (RAM) and/or any other storage device orstorage disk in which information is stored for any duration (e.g., forextended time periods, permanently, for brief instances, for temporarilybuffering, and/or for caching of the information). As used herein, theterm tangible computer readable storage medium is expressly defined toinclude any type of computer readable storage device and/or storage diskand to exclude propagating signals and to exclude transmission media. Asused herein, “tangible computer readable storage medium” and “tangiblemachine readable storage medium” are used interchangeably. In someexamples, the example processes of FIGS. 11A, 11B, 12A, and 12B may beimplemented using coded instructions (e.g., computer and/or machinereadable instructions) stored on a non-transitory computer and/ormachine readable medium such as a hard disk drive, a flash memory, aread-only memory, a compact disk, a digital versatile disk, a cache, arandom-access memory and/or any other storage device or storage disk inwhich information is stored for any duration (e.g., for extended timeperiods, permanently, for brief instances, for temporarily buffering,and/or for caching of the information). As used herein, the termnon-transitory computer readable medium is expressly defined to includeany type of computer readable storage device and/or storage disk and toexclude propagating signals and to exclude transmission media. As usedherein, when the phrase “at least” is used as the transition term in apreamble of a claim, it is open-ended in the same manner as the term“comprising” is open ended. Comprising and all other variants of“comprise” are expressly defined to be open-ended terms. Including andall other variants of “include” are also defined to be open-ended terms.In contrast, the term consisting and/or other forms of consist aredefined to be close-ended terms.

FIGS. 11A and 11B depict flowcharts representative of computer readableinstructions that may be executed to implement the example operationsand management component 406 (FIGS. 4, 5, 7, 9, and 10 ) to deployworkload domains. An example program 1100 is illustrated in FIG. 11A.Initially at block 1102, the example policy manager 1002 receives adomain type of a workload domain (e.g., the example workload domain 902)specified by a user. For example, the policy manager 1002 instructs auser interface screen to be presented (e.g., via a user interface suchas, for example, configuration UI 540 of FIG. 5 ) via which the user mayspecify a domain type for a workload domain to configure and deploy. Theexample policy manager 1002 displays one or more availability option(s)and/or one or more performance option(s) corresponding to the receiveddomain type to the user via the user interface screen (block 1104). Insome examples, the policy manager 1002 may present a user interfacescreen 1500 of FIG. 15 to obtain specific performance options from auser. In the illustrated example of FIG. 15 , slider controls 1502 areused to enable a user to specify a CPU requirement, a memoryrequirement, and/or a storage requirement. In some examples, the policymanager 1002 may present a user interface screen similar to the exampleperformance and availability selection user interface screen 1600 ofFIG. 16 to enable the user to select pre-defined availability and/orperformance options for the workload domain 902. The example policymanager 1002 receives user-selected availability and/or performanceoptions as specified by the user (block 1106).

Based on the received user-selected availability and/or performanceoptions specified by the user, the example policy manager 1002determines and/or adjusts capacity options and displays the capacityoptions to the user (block 1108). In some examples, only availablecapacity options are presented to a user. For example, presenting to auser numerous capacity options that are not compatible with theavailability and/or performance options could cause significant userfrustration as the user uses trial and error in selecting any one ormore of such unavailable capacity options. Instead, using examplesdisclosed herein, the policy manager 1002 analyzes the availabilityand/or performance options to determine capacity options that areavailable based on the selected availability and/or performance optionsso that a user can clearly see only those capacity options that arecompatible with the selected availability and/or performance options. Insome examples, capacity options are only dependent on the availabilityoptions. In such examples, the policy manager 1002 determinesuser-selectable capacity options based on the availability option atblock 1108 but does not perform a similar analysis for performanceoptions because all performance options of the virtual server rack 206are selectable regardless of the availability option. The example policymanager 1002 receives user-selected capacity options specified by theuser (1110).

The example deployment manager 1006 computes a placement solution basedon the availability, performance, and/or capacity options selected bythe user (block 1112). In the illustrated example, placement refers toidentifying the physical racks in which resources will be allocated fordeploying the workload domain 902. In some examples, the deploymentmanager 1006 uses a placement algorithm based on the user-selectedavailability and/or performance options to compute the placementsolution. For example, the placement algorithm causes the exampledeployment manager 1006 to determine how many host servers to allocate,the physical racks from which the host servers will be allocated, andwhich host servers to allocate. In some examples, the user-selectedavailability option causes the placement algorithm to allocate hostservers from a single rack. In other examples, the availability optionmay allow host servers to be allocated from across numerous racks. Inthe illustrated example, the placement algorithm uses policies onavailability to determine how to configure the placement of the workloaddomain 902.

Also at example block 1112, the deployment manager 1006 communicateswith the resource manager 1010 to determine what hardware resources areavailable and/or to check the future availability capabilities of suchhardware resources for implementing the availability and/or performanceoptions selected by the user. In this manner, the deployment manager1006 can determine which hardware resources in which physical racks meetthe user-selected availability options specified at block 1106. In someexamples, computing the placement solution includes obtaining a mostrecent list of accessible resources from the shared pool of configurablecomputing resources. For example, the resource manager 1010 tracksprevious workload domain placement solutions and the resources allocatedfor such previous workload domain placement solutions. As resources areallocated, the resource manager 1010 removes such resources from theshared pool of configurable computing resources, such that subsequentplacement solutions do not allocate the same resources. Similarly, asresources are de-allocated, the resource manager 1010 adds suchresources to the shared pool of configurable computing resources, suchthat subsequent placement solutions can utilize such resources.

In some examples, the user-selected availability option causes thedeployment manager 1006 to allocate host servers from a single rack. Inother examples, the user-selected availability option may cause thedeployment manager 1006 to allocate host servers from across numerousracks. In some examples, when host servers are to be allocated fromacross numerous racks, the deployment manager 1006 fills a rack with oneor more workload domains before moving to the next rack. In someexamples, when host servers are to be allocated from across numerousracks, the deployment manager 1006 allocates resources across a fewestnumber of racks to satisfy a fault domain requirement (e.g., at leastthree racks). In some examples, when host servers are to be allocatedfrom across numerous racks, the deployment manager 1006 allocatesresources across all the existing physical racks or across any number ofphysical racks with limit on the number of physical racks involved. Inthe illustrated example, the deployment manager 1006 uses policies onavailability to determine how to configure the placement of the workloaddomain 902. Example availability policy options are shown in Table 1300of FIG. 13 . Additional policy settings that may be specified by a userat block 1004 are shown in Table 1400 of FIG. 14 .

In some examples, multiple placement solutions are to be computedsimultaneously or substantially simultaneously. In such examples, theshared pool of configurable computing resources changes dynamically asmultiple users attempt to deploy and/or update multiple workloaddomains. Accordingly, the example deployment manager 1006 firstdetermines whether a solution has been found based on the availability,performance, and capacity options selected by the user and the mostrecent list of accessible resources (block 1114). For example, thedeployment manager 1006 determines whether sufficient hardware resourcesin a single physical rack or across numerous physical racks have beenfound to meet the availability and/or performance options specified atblock 1106. If a placement solution is not found (block 1114: NO), theexample deployment manager 1006 presents a message indicating noplacement was found and control returns to block 1104 with updatedavailability and/or performance options for the user to select. If asolution is found (block 1114: YES), the resource manager 1010 attemptsto reserve the resources to prevent them from being used by another user(block 1115). If reservation of the resources is successful (block 1116:YES), the example resource manager 1012 removes the reserved resource(s)from the shared pool of configurable computing resources and controlproceeds to block 1118. However, if reservation of the resources is notsuccessful (e.g., due to the resources being allocated to anotherworkload domain being deployed simultaneously or substantiallysimultaneously) (block 1116: NO), control returns to block 1104 withupdated availability and/or performance options for the user to select.

At block 1118, the example deployment manager 1006 deploys the workloaddomain 902. For example, the workload domain 902 is configured anddeployed based on the user-selected domain type determined at block1102, the user-selected availability option and/or the user-selectedperformance option determined at block 1106, and the user-selectedcapacity option determined at block 1110. The example program 1100 ofFIG. 11A then ends. In some examples, before deploying the workloaddomain 902 at block 1118, the deployment manager 1006 may also requestnetwork configuration requirements from a user. For example, thedeployment manager 1006 may present an example network configurationuser interface screen 1700 of FIG. 17 to solicit network configurationrequirements form the user. In addition, in some examples, the policymanager 1002 may request security requirements from a user. For example,the policy manager 1002 may present a security configuration userinterface screen via which a user may specify security options (e.g.,firewall settings, security group settings, specified IP addresses,etc.) to implement in connection with a requested workload domain. Assuch, the deployment manager 1010 may deploy the workload domain 902 atblock 1118 based on user-specified network configuration requirements,security requirements, the domain type of block 1102, the user-selectedavailability option and the user-selected performance option determinedat block 1106, and the user-selected capacity option determined at block1110.

Although the example program 1100 of FIG. 11A is described in connectionwith configuring and deploying a single workload domain, the exampleprogram 1100 of FIG. 11A implemented in accordance with the teachings ofthis disclosure can be used in a multi-user scenario in which hundredsor thousands of users obtain workload domain services from the virtualserver rack 206. For example, while manually configuring workloaddomains in a manual fashion for such quantities of users would be overlyburdensome or near impossible within required time constraints, examplesdisclosed herein may be used to process workload domain request usingthe operations and management component 406 to configure and deploylarge quantities of workload domains in an efficient and streamlinedfashion without burdening and frustrating end users with long wait timesto access such workload domains.

An example program 1120 is illustrated in FIG. 11B. Initially at block1122, the example policy manager 1002 receives a domain type specifiedby a user. For example, the example policy manager 1002 may present auser interface screen via which the user may specify a domain type for aworkload domain to configure and deploy. The example policy manager 1002receives an availability option specified by a user (block 1124). Forexample, the example policy manager 1002 may present a user interfacescreen similar to an example performance and availability selection userinterface screen 1600 of FIG. 16 to enable a user to specify aparticular availability for the workload domain 902 to be deployed.

The example deployment manager 1006 computes a placement option (block1126). In the illustrated example, placement refers to locating thephysical racks in which resources will be allocated for deploying theworkload domain 902. The example deployment manager 1006 uses aplacement algorithm based on the availability selection to compute theplacement option. For example, the placement algorithm causes theexample deployment manager 1006 to determine how many host servers toallocate, the physical racks from which the host servers will beallocated, and which host servers to allocate. In some examples, theuser-requested availability option causes the placement algorithm toallocate host servers from a single rack. In other examples, theavailability option may allow host servers to be allocated from acrossnumerous racks. In the illustrated example, the placement algorithm usespolicies on availability to determine how to configure the placement ofthe workload domain 902.

Also at example block 1126, the deployment manager 1006 communicateswith the resource manager 1010 to determine what hardware resources areavailable and/or to check the future availability capabilities of suchhardware resources. In this manner, the example deployment manager 1006determines which hardware resources in which physical racks meet theavailability options specified at block 1124.

The example deployment manager 1006 determines whether a solution hasbeen found (block 1128). For example, the deployment manager 1006determines whether sufficient hardware resources in a single physicalrack or across numerous physical racks have been found to meet theavailability options specified at block 1124. If a placement solution isnot found (block 1128: NO), the example deployment manager 1006 presentsa message indicating no placement was found (block 1130) and controlreturns to block 1124 to receive a different availability option fromthe user.

If a solution is found (block 1128: YES), the resources are reserved toprevent them from being used by another user and the example policymanager 1002 determines/adjusts capacity options and/or performanceoptions selectable by a user (block 1132). For example, the examplepolicy manager 1002 determines capacity options and/or performanceoptions that are selectable by a user based on the placement solutiondetermined at block 1126. In this manner, the example policy manager1002 can present only capacity options and/or performance options thatare useable with the determined placement option. For example,presenting to a user numerous capacity options and/or performanceoptions that are not available or compatible with the placement solutioncould cause significant user frustration as the user uses trial anderror in selecting any one or more of such unavailable capacity optionsand/or performance options. Instead, using examples disclosed herein,the example policy manager 1002 analyzes the placement solution todetermine capacity options and/or performance options that are availablebased on the placement solution so that a user can clearly see onlythose capacity options and/or performance options that are compatiblewith the placement solution. In some examples, only capacity options aredependent on the placement solution, and performance options areindependent of the placement solution. In such examples, the examplepolicy manager 1002 determines user-selectable capacity options based onthe placement solution at block 1132 but does not perform a similaranalysis for performance options because all performance options of thevirtual server rack 206 are selectable regardless of the placementsolution.

The example policy manager 1002 presents the user-selectable capacityoptions and performance options at block 1134. For example, the policymanager 1002 may present an example resources selection user interfacescreen 1500 of FIG. 15 to obtain user-specified performance and capacityoptions from a user. In the illustrated example of FIG. 15 , slidercontrols 1502 are used to enable a user to specify a CPU (performance)requirement, a memory (performance) requirement, and a storage(capacity) requirement. The example policy manager 1002 receivescapacity and performance options specified by the user (block 1136). Theexample deployment manager 1006 then deploys the workload domain 902(block 1138). For example, the workload domain 902 is configured anddeployed based on the domain type of block 1122, the availability optionof block 1124, the placement solution of block 1126, and the capacityand performance options of block 1136. The example program 1120 of FIG.11B then ends.

FIGS. 12A and 12B depict flowcharts representative of computer readableinstructions that may be used to implement the operations and managementcomponent 406 of FIGS. 4, 5, 7, 9, and 10 to manage workload domains. Anexample user-interface program 1200 is illustrated in FIG. 12A. At block1202, the example policy manager 1002 presents a policy management viewto a user through a user-interface. The policy management view allowsusers (e.g., customers) to create (e.g., define), update (e.g., change),and/or delete (e.g., remove) policies related to capacities of workloaddomains (block 1204). In some examples, the policy management viewallows users to create, update, and/or delete policies relating to otheroptions of workload domains such as, for example, security,availability, and/or performance options. The example policy manager1002 stores information relating to the created, updated, and/or deletedpolicies in association with the corresponding workload domains in theexample policy database 1008. Thereafter, the example user-interfaceprogram 1200 ends.

An example back-end program 1206 is illustrated in FIG. 12B. At block1208, the example policy manager 1002 loads policies from the examplepolicy database 1008. The example policy enforcer 1004 checks policycompliance at block 1210 by, for example, evaluating whether a capacityof a workload domain in compliance with a capacity associated with auser-defined policy for the workload domain 902 (e.g., created, updated,or deleted at block 1204). At block 1212, the example policy enforcer1004 determines whether there is a policy violation.

In some examples, the policy enforcer 1004 determines there is aviolation when the capacity of the workload domain 902 does not matchthe policy capacity specified in the user-defined policy for theworkload domain 902. For example, the policy enforcer 1004 determines afirst policy capacity specified in the user-defined policy for theworkload domain 902 at a first time (e.g., prior to the user-definedpolicy being updated) and compares the first policy capacity to a secondpolicy capacity specified in the user-defined policy for the workloaddomain 902 at a second time (e.g., after the user-defined policy hasbeen updated).

In some examples, the policy enforcer 1004 determines that the capacityof the workload domain 902 exceeds the policy capacity specified in theuser-defined policy when the first policy capacity is greater than thesecond policy capacity. In some examples, the policy enforcer 1004determines that the capacity of the workload domain 902 is less than thepolicy capacity specified in the user-defined policy when the firstpolicy capacity is less than the second policy capacity. In someexamples, the policy enforcer 1004 determines that the capacity of theworkload domain 902 is in compliance with the policy capacity specifiedin the user-defined policy when the first policy capacity is identicalto the second policy capacity. In some examples, the policy enforcer1004 determines there is a policy violation only when the first policycapacity exceeds than the second policy capacity by a threshold amountand/or when the first policy capacity is less than the second policycapacity by a threshold amount. In such examples, the threshold amountacts as a buffer to prevent constant allocation and/or de-allocation. Insome such examples, the threshold amount may be plus or minus fivepercent of the total capacity.

If the example policy enforcer 1004 determines there is no policyviolation (e.g., the capacity of the workload domain 902 is incompliance with the policy capacity specified in the user-defined policyfor the workload domain 902) (block 1212: NO), then control proceeds toblock 1214. Otherwise (block 1212: YES), control proceeds to block 1216.

At block 1214, the example policy manager 1002 refreshes or otherwisereloads the policies. In some examples, the policy manager 1002 updatesthe user-defined policy according to instructions received by a user atblock 1204. In such examples, the example policy enforcer 1004reevaluates, in response to determining that the policy manager 1002updated the user-defined policy, whether the capacity of the workloaddomain 902 is in compliance with the policy capacity specified in theuser-defined policy, as disclosed above. In some examples, the examplepolicy enforcer 1004 reevaluates whether the capacity of the workloaddomain 902 is in compliance with the policy capacity specified in theuser-defined policy after a threshold amount of time has elapsed sincethe policy enforcer 1004 last evaluated whether the capacity of theworkload domain 902 complied with the policy capacity. This process maycontinue to loop as policies are updated by users.

At block 1216, the example resource manager 1010 determines whether toadd capacity to the workload domain 902 based on a type of policyviolation. For example, the resource manager 1010 is to add capacitywhen the capacity of the workload domain 902 is less than policycapacity specified in the user-defined policy and the resource manageris to not add capacity when the capacity of the workload domain 902exceeds the policy capacity specified in the user-defined policy. Thus,if the example resource manager 1010 determines to add capacity to theworkload domain 902 (block 1216: YES), control proceeds to block 1218.At block 1218, the example deployment manager 1006 determines aplacement solution for additional capacity for the workload domain 902.For example, the deployment manager 1006 identifies first ones of aplurality of computing resources to form a placement solution for theworkload domain 902 based on the difference between the current capacityof the workload domain 902 and policy capacity of the user-definedpolicy based on user-selection of the availability, performance, and/orcapacity options. The example deployment manager 1006 may determine aplacement solution as disclosed above with reference to block 1112 (FIG.11A) or block 1126 (FIG. 11B).

If a placement solution is found (block 1220: YES), control proceeds toblock 1222. Otherwise (block 1220: NO), the example back-end program1206 ceases operation. At block 1222, the resource manager 1010 is toallocate resources to the workload domain 902 based on the placementsolution determined at block 1218. In some examples, the allocatedresources are immediately provisioned after allocation. Thereafter, theexample resource manager 1010 updates the example resource database 1012to remove the allocated resources from the shared pool of configurableresources (block 1224) and the example back-end program 1206 ends.

However, if the example resource manager 1010 determines to not addcapacity to the workload domain 902 (block 1216: NO), control proceedsto block 1226. At block 1226, the resource manager 1010 is tode-allocate resources associated with excess capacity from the workloaddomain 902. In some examples, the de-allocated resources arede-provisioned prior to de-allocation. Thereafter, the example resourcemanager 1010 updates the example resource database 1012 to add thede-allocated resources to the shared pool of configurable resources(block 1224) and the example back-end program 1206 ends.

In some examples, the policy enforcer 1004 is to evaluate whethercapacities of a plurality of workload domains comply with policycapacities of policies defined by multiple users of the plurality ofworkload domains. In some such examples, the resource manager 1010 isto, when ones of the capacities of the plurality of workload domainsexceed the policy capacities of the respective user-defined policies,de-allocate resources associated with excess capacity from the pluralityof workload domains. In some such examples, the deployment manager 1006is to, when ones of the capacities of the plurality of workload domainsare less than the policy capacities of the respective user-definedpolicies, concurrently determine a plurality of placement solutions foradditional capacity for the plurality of workload domains based on acomparative analysis of the capacities of the plurality of workloaddomains, updates to the respective user-defined policies, and theexample resource database 1012 shared by the multiple users. In somesuch examples, the resource manager 1010 is to allocate resources to theplurality of workload domains based on the plurality of placementsolutions.

As disclosed above, hundreds or thousands of users may update his or herrespective policy requesting an increase or decrease in capacity of hisor her respective workload domain. While manually updating workloaddomains in a manual fashion for such quantities of users would be overlyburdensome or near impossible within required time constraints, examplesdisclosed herein may be used to process workload domain requests toconfigure and/or update large quantities of workload domains for aplurality of users in an efficient and streamlined fashion withoutburdening and frustrating end users with long wait times to access suchworkload domains.

FIG. 12C is a flowchart illustrating example computer-readableinstructions to implement block 1222 of FIG. 12B to allocate capacity toa first workload domain. The example implementation of block 1222 beginsat block 1228. At block 1228, the example policy enforcer 1004categorizes existing workload domains based on a type of update to userdefined policies. For example, the example policy enforcer 1004 maygroup together workload domains having updates reflecting a request foradditional or a request to release excess CPU capacity, storagecapacity, memory capacity, etc. At block 1230, the example policyenforcer 1004 determines whether there is a second workload domainwithin a same category as the first workload domain that has excesscapacity. For example, where an update to the user-defined policyassociated with the first workload domain reflects a request foradditional capacity, the policy enforcer 1004 determines whether anupdate to the user-defined policy associated with the second workloaddomain reflects a request to release excess capacity. In such examples,the policy enforcer 1004 first looks to address updates to workloaddomains using other workload domains prior to utilizing the finiteshared resources pool (e.g., due to its finite nature). If the examplepolicy enforcer 1004 determines there is no other workload domainswithin the same category that have excess capacity (block 1230: NO),control proceeds to block 1232. At block 1232, the resource manager 1010allocates the capacity requested by the update to the first workloaddomain from the finite shared resource pool. Thereafter, the exampleimplementation of block 1222 ceases operation.

If the example policy enforcer 1004 determines there is another workloaddomain (e.g., the second workload domain) within the same category thathas excess capacity (block 1230: YES), then control proceeds to block1234. At block 1234, the resource manager 1010 determines whether theexcess capacity associated with the second workload domain is greaterthan or equal to the capacity requested by the update to the firstworkload domain. If the resource manager 1010 determines the excesscapacity associated with the second workload domain is less than thecapacity requested by the update to the first workload domain (block1234: NO), control proceeds to block 1236. At block 1236, the policyenforcer 1004 determines whether there is another workload domain (e.g.,a third workload domain) within the same category as the first workloaddomain that has excess capacity. If the policy enforcer 1004 determinesthere is no other workload domain within the same category as the firstworkload domain that has excess capacity (block 1236: NO), controlproceeds to block 1232. However, if the policy enforcer 1004 determinesthere is a third workload domain within the same category as the firstworkload domain that has excess capacity (block 1236: YES), controlproceeds to block 1238.

At block 1238, the resource manager 1010 determines whether the excesscapacity associated with the aggregate of the second and third workloaddomains is greater than or equal to the capacity requested by the updateto the first workload domain. If the resource manager 1010 determinesthe excess capacity associated with the combination of the second andthird workload domains is less than the capacity requested by the updateto the first workload domain (block 1238: NO), control returns to block1236. If the resource manager 1010 determines the excess capacityassociated with the combination of the second and third workload domainsis greater than or equal to the capacity requested by the update to thefirst workload domain (block 1238: YES) or if the resource manager 1010determines the excess capacity associated with the second workloaddomain is greater than or equal to the capacity requested by the updateto the first workload domain (block 1234: YES), control proceeds toblock 1240. At block 1240, the example resource manager 1010 allocatesthe capacity requested by the update to the first workload domain fromthe workload domain(s) (e.g., second, third, fourth, etc. workloaddomains) with excess capacity. Thereafter, the example implementation ofblock 1222 ceases operation.

FIG. 12D is a flowchart illustrating example computer-readableinstructions to implement block 1226 of FIG. 12B to de-allocate capacityfrom the first workload domain. The example implementation of block 1226begins at block 1242. At block 1242, the example policy enforcer 1004categorizes existing workload domains based on a type of update to userdefined policies. At block 1244, the example policy enforcer 1004determines whether there is a second workload domain within a samecategory as the first workload domain that is requesting additionalcapacity. For example, where an update to the user-defined policyassociated with the first workload domain reflects a request to releaseexcess capacity, the policy enforcer 1004 determines whether an updateto the user-defined policy associated with the second workload domainreflects a request for additional capacity. In such examples, theresource manager 1010 first looks to address updates to workload domainsusing other workload domains prior to utilizing the finite sharedresources pool (e.g., due to its finite nature). If the example policyenforcer 1004 determines there is no other workload domains within thesame category that are requesting additional capacity (block 1244: NO),control proceeds to block 1246. At block 1246, the resource manager 1010de-allocates the capacity from first workload domain to the finiteshared resource pool. Thereafter, the example implementation of block1226 ceases operation.

If the example policy enforcer 1004 determines there is another workloaddomain (e.g., the second workload domain) within the same category thatis requesting additional capacity (block 1244: YES), then controlproceeds to block 1248. At block 1248, the resource manager 1010determines whether the excess capacity associated with the firstworkload domain is greater than or equal to the capacity requested bythe update to the second workload domain. If the resource manager 1010determines the excess capacity associated with the first workload domainis less than the capacity requested by the update to the second workloaddomain (block 1248: NO), control proceeds to block 1250. At block 1250,the policy enforcer 1004 determines whether there is another workloaddomain (e.g., a third workload domain) within the same category as thefirst workload domain that has excess capacity. If the policy enforcer1004 determines there is no other workload domain within the samecategory as the first workload domain that has excess capacity (block1250: NO), control proceeds to block 1246. However, if the policyenforcer 1004 determines there is a third workload domain within thesame category as the first workload domain that has excess capacity(block 1250: YES), control proceeds to block 1252.

At block 1252, the resource manager 1010 determines whether the excesscapacity associated with the aggregate of the first and third workloaddomains is greater than or equal to the capacity requested by the updateto the second workload domain. If the resource manager 1010 determinesthe excess capacity associated with the combination of the first andthird workload domains is less than the capacity requested by the updateto the second workload domain (block 1252: NO), control returns to block1250. If the resource manager 1010 determines the excess capacityassociated with the combination of the first and third workload domainsis greater than or equal to the capacity requested by the update to thesecond workload domain (block 1252: YES) or if the resource manager 1010determines the excess capacity associated with the first workload domainis greater than or equal to the capacity requested by the update to thesecond workload domain (block 1248: YES), control proceeds to block1254. At block 1254, the example resource manager 1010 allocates thecapacity requested by the update to the second workload domain from theworkload domain(s) (e.g., first, third, fourth, etc. workload domains)with excess capacity. At block 1256, the policy enforcer 1004 determineswhether all excess capacity associated with the first workload domainhas been de-allocated. If the policy enforcer 1004 determines that notall excess capacity associated with the first workload domain has beende-allocated (block 1256: NO), control returns to block 1244. If thepolicy enforcer 1004 determines that all excess capacity associated withthe first workload domain has been de-allocated (block 1256: YES), theexample implementation of block 1222 ceases operation.

FIG. 18 is a block diagram of an example processor platform 1800 capableof executing the instructions of FIGS. 11A, 11B, 12A, and/or 12B toimplement the example operations and management component 406 of FIGS.4, 5, 7, 9 and/or 10 . The processor platform 1800 of the illustratedexample includes a processor 1812. The processor 1812 of the illustratedexample is hardware. For example, the processor 1812 can be implementedby one or more integrated circuits, logic circuits, microprocessors orcontrollers from any desired family or manufacturer.

The processor 1812 of the illustrated example includes a local memory1813 (e.g., a cache), and executes instructions to implement the exampleoperations and management component 406 or portions thereof. Theprocessor 1812 of the illustrated example is in communication with amain memory including a volatile memory 1814 and a non-volatile memory1816 via a bus 1818. The volatile memory 1814 may be implemented bySynchronous Dynamic Random Access Memory (SDRAM), Dynamic Random AccessMemory (DRAM), RAMBUS Dynamic Random Access Memory (RDRAM) and/or anyother type of random access memory device. The non-volatile memory 1816may be implemented by flash memory and/or any other desired type ofmemory device. Access to the main memory 1814, 1816 is controlled by amemory controller.

The processor platform 1800 of the illustrated example also includes aninterface circuit 1820. The interface circuit 1820 may be implemented byany type of interface standard, such as an Ethernet interface, auniversal serial bus (USB), and/or a PCI express interface.

In the illustrated example, one or more input devices 1822 are connectedto the interface circuit 1820. The input device(s) 1822 permit(s) a userto enter data and commands into the processor 1812. The input device(s)can be implemented by, for example, an audio sensor, a microphone, akeyboard, a button, a mouse, a touchscreen, a track-pad, a trackball,isopoint and/or a voice recognition system.

One or more output devices 1824 are also connected to the interfacecircuit 1820 of the illustrated example. The output devices 1824 can beimplemented, for example, by display devices (e.g., a light emittingdiode (LED), an organic light emitting diode (OLED), a liquid crystaldisplay, a cathode ray tube display (CRT), a touchscreen, a tactileoutput device, a printer and/or speakers). The interface circuit 1820 ofthe illustrated example, thus, typically includes a graphics drivercard, a graphics driver chip or a graphics driver processor.

The interface circuit 1820 of the illustrated example also includes acommunication device such as a transmitter, a receiver, a transceiver, amodem and/or network interface card to facilitate exchange of data withexternal machines (e.g., computing devices of any kind) via a network1826 (e.g., an Ethernet connection, a digital subscriber line (DSL), atelephone line, coaxial cable, a cellular telephone system, etc.).

The processor platform 1800 of the illustrated example also includes oneor more mass storage devices 1828 for storing software and/or data.Examples of such mass storage devices 1828 include flash devices, floppydisk drives, hard drive disks, optical compact disk (CD) drives, opticalBlu-ray disk drives, RAID systems, and optical digital versatile disk(DVD) drives.

Coded instructions 1832 representative of the example machine readableinstructions of FIGS. 11A, 11B, 12A, and/or 12B may be stored in themass storage device 1828, in the volatile memory 1814, in thenon-volatile memory 1816, and/or on a removable tangible computerreadable storage medium such as a CD or DVD.

From the foregoing, it will be appreciated that the above disclosedmethods, apparatus and articles of manufacture manage workload domainsbased on changes to policy capacities after workload domain deployment.The examples disclosed herein compare capacities of workload domains forcompliance to one or more policy capacities and add and/or removeresources to maintain compliance of the workload domains.

An example apparatus to manage a plurality of workload domains ofmultiple users comprises a policy enforcer to evaluate whethercapacities of the plurality of workload domains comply with policycapacities of respective user-defined policies for the plurality ofworkload domains, a resource manager to, when ones of the capacities ofthe plurality of workload domains exceed the policy capacities of therespective user-defined policies, de-allocate resources associated withexcess capacity from the plurality of workload domains, and a processorto, when ones of the capacities of the plurality of workload domains areless than the policy capacities of the respective user-defined policies,determine a plurality of placement solutions for additional capacity forthe plurality of workload domains corresponding to the multiple usesbased on concurrent analysis of: (a) the capacities of the plurality ofworkload domains, (b) updates to the respective user-defined policies,and (c) a resource database shared by the multiple users, the resourcemanager to allocate resources to the plurality of workload domains basedon the plurality of placement solutions.

In some examples, the resource manager is to update the resourcedatabase based on at least one of the de-allocation of the resourcesfrom the plurality of workload domains or the allocation of theresources to the plurality of workload domains.

In some examples, the apparatus further includes a policy manager toupdate the respective user-defined policies for the plurality ofworkload domains based on user input from respective ones of themultiple users.

In some examples, when ones of the capacities of the plurality ofworkload domain comply with the policy capacities of the respectiveuser-defined policies, the policy enforcer is to, in response todetermining that the policy manager updated the user-defined policies,reevaluate whether the capacities of the plurality of workload domainscomply with the policy capacities of respective user-defined policiesfor the plurality of workload domains.

In some examples, to evaluate whether the capacities of the plurality ofworkload domains comply with the policy capacities of respectiveuser-defined policies for the plurality of workload domains, the policyenforcer is to, determine a first policy capacity of a first one of theuser-defined policies for a first one of the plurality of workloaddomains at a first time, and compare the first policy capacity to asecond policy capacity specified in the first one of the user-definedpolicies for the first one of the plurality of workload domains at asecond time.

In some examples, the policy enforcer is to determine that a capacity ofthe first one of the plurality of the workload domains exceeds the firstone of the user-defined policies when the first policy capacity exceedsthe second policy capacity, determine that the capacity of the first oneof the plurality of the workload domains is less than the first one ofthe user-defined policies when the first policy capacity is less thanthe second policy capacity, and determine that the capacity of the firstone of the plurality of the workload domains complies with the first oneof the user-defined policies when the first policy capacity is identicalto the second policy capacity.

An example method to manage a workload domain comprises evaluating, byexecuting an instruction with a processor, whether capacities of theplurality of workload domains comply with policy capacities ofrespective user-defined policies for the plurality of workload domains,when ones of the capacities of the plurality of workload domains exceedthe policy capacities of the respective user-defined policies,de-allocating, by executing an instruction with the processor, resourcesassociated with excess capacity from the plurality of workload domains,and, when ones of the capacities of the plurality of workload domainsare less than the policy capacities of the respective user-definedpolicies, concurrently determining, by executing an instruction with theprocessor, a plurality of placement solutions for additional capacityfor the plurality of workload domains based on a comparative analysisof: (a) the capacities of the plurality of workload domains, (b) updatesto the respective user-defined policies, and (c) a resource databaseshared by the multiple users, the resource manager to allocate resourcesto the plurality of workload domains based on the plurality of placementsolutions.

In some examples, the method further includes updating the resourcedatabase based on at least one of the de-allocation of the resourcesfrom the plurality of workload domains or the allocation of theresources to the plurality of workload domains.

In some examples, the method further includes updating the respectiveuser-defined policies for the plurality of workload domains based onuser input from respective ones of the multiple users.

In some examples, the method further includes when ones of thecapacities of the plurality of workload domain comply with the policycapacities of the respective user-defined policies, reevaluating whetherthe capacities of the plurality of workload domains comply with thepolicy capacities of respective user-defined policies for the pluralityof workload domains in response to updating the respective user-definedpolicies.

In some examples, the method further includes reevaluating whether thecapacities of the plurality of workload domains comply with the policycapacities of respective user-defined policies for the plurality ofworkload domains after a threshold amount of time has elapsed since theevaluating of whether the capacities of the plurality of workloaddomains comply with the policy capacities of respective user-definedpolicies for the plurality of workload domains.

In some examples, the evaluating of whether the capacities of theplurality of workload domains comply with the policy capacities ofrespective user-defined policies for the plurality of workload domainsincludes, determining a first policy capacity of a first one of theuser-defined policies for a first one of the plurality of workloaddomains at a first time, and comparing the first policy capacity to asecond policy capacity specified in the first one of the user-definedpolicies for the first one of the plurality of workload domains at asecond time.

In some examples, the method further includes determining that acapacity of the first one of the plurality of the workload domainsexceeds the first one of the user-defined policies when the first policycapacity exceeds the second policy capacity, determining that thecapacity of the first one of the plurality of the workload domains isless than the first one of the user-defined policies when the firstpolicy capacity is less than the second policy capacity, and determiningthat the capacity of the first one of the plurality of the workloaddomains complies with the first one of the user-defined policies whenthe first policy capacity is identical to the second policy capacity.

Although certain example methods, apparatus and articles of manufacturehave been disclosed herein, the scope of coverage of this patent is notlimited thereto. On the contrary, this patent covers all methods,apparatus and articles of manufacture fairly falling within the scope ofthe claims of this patent.

1.-20. (canceled)
 21. An apparatus comprising: interface circuitry; machine readable instructions; and programmable circuitry to execute the machine readable instructions to: determine available compute resources in a finite pool of compute resources; determine first compute resources of the available compute resources based on a first availability parameter for a first service, a first performance parameter for the first service, and a first capacity parameter for the first service, the first compute resources to execute the first service; and contemporaneously determine second compute resources of the available compute resources based on a second availability parameter for a second service, a second performance parameter for the second service, and a second capacity parameter for the second service, the second compute resources to execute the second service.
 22. The apparatus of claim 21, wherein the programmable circuitry is to, after a successful reservation of the first compute resources, cause the first compute resources to be removed from the available compute resources.
 23. The apparatus of claim 22, wherein the programmable circuitry is to, after a failed reservation of the second compute resources, determine third compute resources of remaining ones of the available compute resources, the third compute resources determined based on at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 24. The apparatus of claim 22, wherein the programmable circuitry is to, after a failed reservation of the second compute resources, cause display of an interface to access at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 25. The apparatus of claim 21, wherein the programmable circuitry is to update the available compute resources based on services deployed using the finite pool of compute resources.
 26. The apparatus of claim 21, wherein the first compute resources include a virtual server rack including resources located across a plurality of physical racks.
 27. The apparatus of claim 21, wherein the first compute resources include a physical rack.
 28. A non-transitory computer readable storage medium comprising instructions that cause one or more machines to at least: determine available compute resources in a finite pool of compute resources; determine first compute resources of the available compute resources based on a first availability parameter for a first service, a first performance parameter for the first service, and a first capacity parameter for the first service, the first compute resources to execute the first service; and contemporaneously determine second compute resources of the available compute resources based on a second availability parameter for a second service, a second performance parameter for the second service, and a second capacity parameter for the second service, the second compute resources to execute the second service.
 29. The non-transitory computer readable storage medium of claim 28, wherein the instructions cause the one or more machines to, after a successful reservation of the first compute resources, cause the first compute resources to be removed from the available compute resources.
 30. The non-transitory computer readable storage medium of claim 29, wherein the instructions cause the one or more machines to, after a failed reservation of the second compute resources, determine third compute resources of remaining ones of the available compute resources, the third compute resources determined based on at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 31. The non-transitory computer readable storage medium of claim 29, wherein the instructions cause the one or more machines to, after a failed reservation of the second compute resources, cause display of an interface to access at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 32. The non-transitory computer readable storage medium of claim 28, wherein the instructions cause the one or more machines to update the available compute resources based on services deployed using the finite pool of compute resources.
 33. The non-transitory computer readable storage medium of claim 28, wherein the first compute resources include a virtual server rack including resources located across a plurality of physical racks.
 34. The non-transitory computer readable storage medium of claim 28, wherein the first compute resources include a physical rack.
 35. A method comprising: determining, by executing an instruction with programmable circuitry, available compute resources in a finite pool of compute resources; determining, by executing an instruction with the programmable circuitry, first compute resources of the available compute resources based on a first availability parameter for a first service, a first performance parameter for the first service, and a first capacity parameter for the first service, the first compute resources to execute the first service; and contemporaneously determining, by executing an instruction with the programmable circuitry, second compute resources of the available compute resources based on a second availability parameter for a second service, a second performance parameter for the second service, and a second capacity parameter for the second service, the second compute resources to execute the second service.
 36. The method of claim 35, further including, after a successful reservation of the first compute resources, removing the first compute resources from the available compute resources.
 37. The method of claim 36, further including, after a failed reservation of the second compute resources, determining third compute resources of remaining ones of the available compute resources, the third compute resources determined based on at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 38. The method of claim 36, further including, after a failed reservation of the second compute resources, displaying an interface to access at least one of an updated availability parameter for the second service, an updated performance parameter for the second service, or an updated capacity parameter for the second service.
 39. The method of claim 35, further including updating the available compute resources based on services deployed using the finite pool of compute resources.
 40. The method of claim 35, wherein the first compute resources include a virtual server rack including resources located across a plurality of physical racks. 